Difference between revisions of "Programming/Kdb/Q-sql"

From Thalesians Wiki
 
(7 intermediate revisions by the same user not shown)
Line 97: Line 97:
</pre>
</pre>


We have just used '''dot''' (<tt>.</tt>) for dyadic functions. Quoting <span class="plainlinks">[https://code.kx.com/q4m3/ Q for Mortals]</span>, the general form of functional <tt>.</tt> for dyadic functions is
We have just used '''dot''' (<tt>.</tt>) for dyadic functions. Quoting [https://code.kx.com/q4m3/ Q for Mortals], the general form of functional <tt>.</tt> for dyadic functions is
<pre>
<pre>
.[L;I;f;y]
.[L;I;f;y]
Line 170: Line 170:
Just as we used to extend a table.
Just as we used to extend a table.


\begin{frame}[fragile]
=Deleting rows from a table=
\frametitle{Deleting rows from a table}
 
\begin{itemize}
To delete all rows from the <tt>quotes</tt> table we can use the following code:
\item To delete all rows from the \code{quotes} table we can use the following code:
<pre>
\begin{snippet}{q}
q)delete from `quotes
q)delete from `quotes
\end{snippet}
</pre>
\item Notice that we referred to the quotes table by its symbol name, \code{`quotes}, instead of providing a variable, \code{quotes}. This is generally the case when we want to modify a table in place.
Notice that we referred to the quotes table by its symbol name, <pre>`quotes</pre>, instead of providing a variable, <tt>quotes</tt>. This is generally the case when we want to modify a table in place.
\item We can check that the \code{quotes} table is now indeed empty:
 
\begin{snippet}{q}
We can check that the <tt>quotes</tt> table is now indeed empty:
<pre>
q)count quotes
q)count quotes
0
0
Line 185: Line 185:
code date time bidprice bidsize askprice asksize mktflag
code date time bidprice bidsize askprice asksize mktflag
--------------------------------------------------------
--------------------------------------------------------
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
=Repopulating <tt>quotes</tt>=
\frametitle{Repopulating \textbf{quotes}}
Let us repopulate <tt>quotes</tt> again:
Let us repopulate \code{quotes} again:
<pre>
\begin{snippet}{q}
quotes:([]code:`symbol$();date:`date$();time:`time$();bidprice:`float$();bidsize:`short$();askprice:`float$();asksize:`short$();mktflag:`symbol$());
quotes:([]code:`symbol$();date:`date$();time:`time$();bidprice:`float$();bidsize:`short$();askprice:`float$();asksize:`short$();mktflag:`symbol$());
.[`quotes;();,;(
.[`quotes;();,;(
Line 206: Line 203:
     (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E);
     (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E);
     (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]
     (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=The metadata of the trades table before we populate it=
\frametitle{The metadata of the trades table before we populate it}
 
\begin{itemize}
Let us examine the metadata of the trades table before we populate it:
\item Let us examine the metadata of the trades table before we populate it:
<pre>
\begin{snippet}{q}
meta trades
meta trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
c      | t f a
c      | t f a
--------| -----
--------| -----
Line 226: Line 221:
mktflag | s
mktflag | s
comments|
comments|
\end{snippet}
</pre>
\item Note that
Note that
\begin{snippet}{q}
<pre>
type trades[`comments]
type trades[`comments]
\end{snippet}
</pre>
is \code{0h}, so this column is a \nb{mixed} list (remember that table columns are all lists). It appeared in the schema without a type cast:
is <tt>0h</tt>, so this column is a ''mixed'' list (remember that table columns are all lists). It appeared in the schema without a type cast:
\begin{snippet}{q}
<pre>
trades : ([]
trades : ([]
     ...
     ...
Line 238: Line 233:
     mktflag:`symbol$();
     mktflag:`symbol$();
     comments:())
     comments:())
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
====Exercise====
\frametitle{Exercise}
 
Use dot (\code{.}) to populate the \code{trades} table with the following rows:
Use dot (<tt>.</tt>) to populate the <tt>trades</tt> table with the following rows:
\begin{snippet}{q}
<pre>
(`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
(`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
(`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; "");
(`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; "");
Line 263: Line 256:
(`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
(`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
(`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; "")
(`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; "")
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
====Solution====
\frametitle{Solution~(i)}
 
\begin{snippet}{q}
<pre>
.[`trades;();,;(
.[`trades;();,;(
     (`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
     (`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
Line 287: Line 279:
     (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
     (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
     (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]
     (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{Solution~(ii)}
\begin{snippet}{q}
trades
trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
code  date      time        price  volume mktflag comments                ..
code  date      time        price  volume mktflag comments                ..
-----------------------------------------------------------------------------..
-----------------------------------------------------------------------------..
Line 315: Line 304:
GCM16 2016.04.11 16:00:00.044 1257.9 1      E      ""                      ..
GCM16 2016.04.11 16:00:00.044 1257.9 1      E      ""                      ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E      ""                      ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E      ""                      ..
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=The metadata of the trades table after we have populated it=
\frametitle{The metadata of the trades table after we have populated it}
 
\begin{itemize}
Let us examine the metadata of the trades table before we populate it:
\item Let us examine the metadata of the trades table before we populate it:
<pre>
\begin{snippet}{q}
meta trades
meta trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
c      | t f a
c      | t f a
--------| -----
--------| -----
Line 335: Line 322:
mktflag | s
mktflag | s
comments| C
comments| C
\end{snippet}
</pre>
\item Note that
 
\begin{snippet}{q}
Note that
<pre>
type trades[`comments]
type trades[`comments]
\end{snippet}
</pre>
    is still \code{0h}, but the metadata above tells us that \code{comments} is a column of strings. Strings are themselves lists
is still <tt>0h</tt>, but the metadata above tells us that <tt>comments</tt> is a column of strings. Strings are themselves lists
\begin{snippet}{q}
<pre>
trades[`comments][0][0]
trades[`comments][0][0]
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
"N"
"N"
\end{snippet}
</pre>
    and the only way to declare a list of lists is through a mixed list, there is no other way we could have declared this column in our schema.
and the only way to declare a list of lists is through a mixed list, there is no other way we could have declared this column in our schema.
\end{itemize}
\end{frame}


\begin{frame}[fragile]
=Q-sql select=
\frametitle{q-sql select}
 
\begin{itemize}
Consider the simplest possible form of q-sql <tt>select</tt>:
\item Consider the simplest possible form of q-sql \defn{select}:
<pre>
\begin{snippet}{q}
select from quotes
select from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
code  date      time        bidprice bidsize askprice asksize mktflag
code  date      time        bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
-----------------------------------------------------------------------
Line 372: Line 357:
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:21:43.786 40.76    3      40.77    28      E
CLM16 2016.04.08 15:21:43.786 40.76    3      40.77    28      E
\end{snippet}
</pre>
\item We have selected all rows from the table.
\end{itemize}
\end{frame}


\begin{frame}[fragile]
We have selected all rows from the table.
\frametitle{Exercise}
Select all rows from the \code{trades} table.
\end{frame}


\begin{frame}[fragile]
====Exercise====
\frametitle{Solution}
 
\begin{snippet}{q}
Select all rows from the <tt>trades</tt> table.
 
====Solution====
 
<pre>
select from trades
select from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
code  date      time        price  volume mktflag comments                ..
code  date      time        price  volume mktflag comments                ..
-----------------------------------------------------------------------------..
-----------------------------------------------------------------------------..
Line 407: Line 390:
GCM16 2016.04.11 16:00:00.044 1257.9 1      E      ""                      ..
GCM16 2016.04.11 16:00:00.044 1257.9 1      E      ""                      ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E      ""                      ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E      ""                      ..
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=Q-sql select: selecting columns=
\frametitle{q-sql select: selecting columns}
 
\begin{itemize}
We may list the columns that we want to select, in the order in which we want them to appear in the resulting table:
\item We may list the columns that we want to select, in the order in which we want them to appear in the resulting table:
<pre>
\begin{snippet}{q}
select date,time,code,asksize,askprice from quotes
select date,time,code,asksize,askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date      time        code  asksize askprice
date      time        code  asksize askprice
----------------------------------------------
----------------------------------------------
Line 431: Line 412:
2016.04.08 15:20:02.000 CLM16 44      40.79
2016.04.08 15:20:02.000 CLM16 44      40.79
2016.04.08 15:21:43.786 CLM16 28      40.77
2016.04.08 15:21:43.786 CLM16 28      40.77
\end{snippet}
</pre>
\item By default, the output is not saved, but we can set it to a variable:
 
\begin{snippet}{q}
By default, the output is not saved, but we can set it to a variable:
<pre>
new_quotes_table:select date,time,code,asksize,askprice from quotes
new_quotes_table:select date,time,code,asksize,askprice from quotes
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
====Exercise====
\frametitle{Exercise}
Select the \code{date}, \code{time}, \code{volume}, and \code{price} (in that order) from the \code{trades} table.
\end{frame}


\begin{frame}[fragile]
Select the <tt>date</tt>, <tt>time</tt>, <tt>volume</tt>, and <tt>price</tt> (in that order) from the <tt>trades</tt> table.
\frametitle{Solution}
 
\begin{snippet}{q}
====Solution====
 
<pre>
select date,time,volume,price from trades
select date,time,volume,price from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date      time        volume price
date      time        volume price
-------------------------------------
-------------------------------------
Line 469: Line 448:
2016.04.11 16:00:00.044 1      1257.9
2016.04.11 16:00:00.044 1      1257.9
2016.04.11 16:28:34.311 1      1258.7
2016.04.11 16:28:34.311 1      1258.7
\end{snippet}
</pre>
\end{frame}
 
=Q-sql select: selecting columns under new names=


\begin{frame}[fragile]
\frametitle{q-sql select: selecting columns under new names}
We may give the selected columns new names in the result:
We may give the selected columns new names in the result:
\begin{snippet}{q}
<pre>
select d:date,t:time,code,size:asksize,price:askprice from quotes
select d:date,t:time,code,size:asksize,price:askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
d          t            code  size price
d          t            code  size price
-----------------------------------------
-----------------------------------------
Line 492: Line 470:
2016.04.08 15:20:02.000 CLM16 44  40.79
2016.04.08 15:20:02.000 CLM16 44  40.79
2016.04.08 15:21:43.786 CLM16 28  40.77
2016.04.08 15:21:43.786 CLM16 28  40.77
\end{snippet}
</pre>
\end{frame}


\begin{frame}
====Exercise====
\frametitle{Exercise}
Select the \code{date}, \code{time}, \code{volume}, and \code{price} (in that order) from the \code{trades} table but under the names \code{d}, \code{t}, \code{v}, and \code{p}.
\end{frame}


\begin{frame}[fragile]
Select the <tt>date</tt>, <tt>time</tt>, <tt>volume</tt>, and <tt>price</tt> (in that order) from the <tt>trades</tt> table but under the names <tt>d</tt>, <tt>t</tt>, <tt>v</tt>, and <tt>p</tt>.
\frametitle{Solution}
 
\begin{snippet}{q}
====Solution====
 
<pre>
select d:date,t:time,v:volume,p:price from trades
select d:date,t:time,v:volume,p:price from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
d          t            v p
d          t            v p
--------------------------------
--------------------------------
Line 525: Line 501:
2016.04.11 16:00:00.044 1 1257.9
2016.04.11 16:00:00.044 1 1257.9
2016.04.11 16:28:34.311 1 1258.7
2016.04.11 16:28:34.311 1 1258.7
\end{snippet}
</pre>
\end{frame}
 
=Q-sql select: operations on columns=


\begin{frame}[fragile]
\frametitle{q-sql select: operations on columns~(i)}
Perhaps more interestingly, selected columns may be functions of / results of operations on the existing ones:
Perhaps more interestingly, selected columns may be functions of / results of operations on the existing ones:
\begin{snippet}{q}
<pre>
select dt:date+time,code,size:asksize,price:askprice from quotes
select dt:date+time,code,size:asksize,price:askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
dt                            code  size price
dt                            code  size price
-----------------------------------------------
-----------------------------------------------
Line 548: Line 523:
2016.04.08D15:20:02.000000000 CLM16 44  40.79
2016.04.08D15:20:02.000000000 CLM16 44  40.79
2016.04.08D15:21:43.786000000 CLM16 28  40.77
2016.04.08D15:21:43.786000000 CLM16 28  40.77
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
\frametitle{q-sql select: operations on columns~(ii)}
Remember that columns are lists, so the above + works on two lists:
Remember that columns are lists, so the above + works on two lists:
\begin{snippet}{q}
<pre>
q)quotes[`date]
q)quotes[`date]
2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 ..
2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 ..
Line 561: Line 533:
q)quotes[`date]+quotes[`time]
q)quotes[`date]+quotes[`time]
2016.04.07D10:15:00.010000000 2016.04.07D11:02:16.663000000 2016.04.07D12:05:..
2016.04.07D10:15:00.010000000 2016.04.07D11:02:16.663000000 2016.04.07D12:05:..
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
====Exercise====
\frametitle{Exercise}
\begin{itemize}
\item The best bid and best ask (offer) prices are denoted by $P^b_t$ and $P^a_t$, respectively.
\item The \defn{midprice} is the average of the best bid and best ask prices:
    \begin{equation*}
    \text{Midprice}_t = \frac{P^a_t + P^b_t}{2}.
    \end{equation*}
\item Write a q-sql \code{select} query selecting the timestamp (\code{date + time}) and the midprice.
\end{itemize}
\end{frame}


\begin{frame}[fragile]
The best bid and best ask (offer) prices are denoted by <math>P^b_t</math> and <math>P^a_t</math>, respectively.
\frametitle{Solution}
 
\begin{snippet}{q}
The '''midprice''' is the average of the best bid and best ask prices:
<center><math>
\text{Midprice}_t = \frac{P^a_t + P^b_t}{2}.
</math></center>
 
Write a q-sql <tt>select</tt> query selecting the timestamp (<tt>date + time</tt>) and the midprice.
 
====Solution====
 
<pre>
select dt:date+time,mid:.5*bidprice+askprice from quotes
select dt:date+time,mid:.5*bidprice+askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
dt                            mid
dt                            mid
-------------------------------------
-------------------------------------
Line 595: Line 565:
2016.04.08D15:20:02.000000000 40.78
2016.04.08D15:20:02.000000000 40.78
2016.04.08D15:21:43.786000000 40.765
2016.04.08D15:21:43.786000000 40.765
\end{snippet}
</pre>
\end{frame}


\begin{frame}
====Exercise====
\frametitle{Exercise}
\begin{itemize}
\item The \defn{microprice} is often more useful than the midprice, since it weights the best bid and best ask prices with the volumes posted at the best bid and best ask prices:
    \begin{equation*}
    \text{Microprice}_t = \frac{V^b_t P^a_t + V^a_t P^b_t}{V^b_t + V^a_t},
    \end{equation*}
    where $V^b_t$ and $V^a_t$ represent the volumes posted at the best bid and best ask prices, respectively.
\item Modify your query to add the microprice to it.
\end{itemize}
\end{frame}


\begin{frame}[fragile]
The '''microprice''' is often more useful than the midprice, since it weights the best bid and best ask prices with the volumes posted at the best bid and best ask prices:
\frametitle{Solution}
<center><math>
\begin{snippet}{q}
\text{Microprice}_t = \frac{V^b_t P^a_t + V^a_t P^b_t}{V^b_t + V^a_t},
</math></center>
where <math>V^b_t</math> and <math>V^a_t</math> represent the volumes posted at the best bid and best ask prices, respectively.
 
Modify your query to add the microprice to it.
 
====Solution====
 
<pre>
select dt:date+time,mid:.5*bidprice+askprice,microprice:((bidsize*askprice)+asksize*bidprice)%bidsize+asksize from quotes
select dt:date+time,mid:.5*bidprice+askprice,microprice:((bidsize*askprice)+asksize*bidprice)%bidsize+asksize from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
dt                            mid    microprice
dt                            mid    microprice
------------------------------------------------
------------------------------------------------
Line 629: Line 596:
2016.04.08D15:20:02.000000000 40.78  40.77743
2016.04.08D15:20:02.000000000 40.78  40.77743
2016.04.08D15:21:43.786000000 40.765  40.76097
2016.04.08D15:21:43.786000000 40.765  40.76097
\end{snippet}
</pre>
\end{frame}
 
=A challenge=


\begin{frame}[fragile]
So what if we want to return only the first character from each code?
\frametitle{A challenge~(i)}
<pre>
\begin{itemize}
\item So what if we want to return only the first character from each code?
\begin{snippet}{q}
select dt:date+time,shortcode:code[0],size:asksize,price:askprice from quotes
select dt:date+time,shortcode:code[0],size:asksize,price:askprice from quotes
\end{snippet}
</pre>
does not give the desired result:
does not give the desired result:
\begin{snippet}{output}
<pre>
dt                            shortcode size price
dt                            shortcode size price
---------------------------------------------------
---------------------------------------------------
Line 654: Line 619:
2016.04.08D15:20:02.000000000 CLM16    44  40.79
2016.04.08D15:20:02.000000000 CLM16    44  40.79
2016.04.08D15:21:43.786000000 CLM16    28  40.77
2016.04.08D15:21:43.786000000 CLM16    28  40.77
\end{snippet}
</pre>
\item This is because the indexing, \code{...[0]}, is applied to the entire list \code{quotes[`code]}, and we get a symbol scalar, \code{`CLM16}. This scalar is returned for each row in the result, as you can see above.
 
\end{itemize}
This is because the indexing, <tt>...[0]</tt>, is applied to the entire list <tt>quotes[`code]</tt>, and we get a symbol scalar, <tt>`CLM16</tt>. This scalar is returned for each row in the result, as you can see above.
\end{frame}


\begin{frame}[fragile]
\frametitle{A challenge~(ii)}
Indeed, this is what happens when we include scalars among selected columns:
Indeed, this is what happens when we include scalars among selected columns:
\begin{snippet}{q}
<pre>
select dt:date+time,shortcode:123,size:asksize,price:askprice from quotes
select dt:date+time,shortcode:123,size:asksize,price:askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
dt                            shortcode size price
dt                            shortcode size price
---------------------------------------------------
---------------------------------------------------
Line 679: Line 641:
2016.04.08D15:20:02.000000000 123      44  40.79
2016.04.08D15:20:02.000000000 123      44  40.79
2016.04.08D15:21:43.786000000 123      28  40.77
2016.04.08D15:21:43.786000000 123      28  40.77
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
Still, what if we want to return the first character of each code?
\frametitle{A challenge~(iii)}
 
\begin{itemize}
One way to do this is by applying a function to <tt>each</tt> element of the list which represents the resulting column:
\item Still, what if we want to return the first character of each code?
<pre>
\item One way to do this is by applying a function to \code{each} element of the list which represents the resulting column:
\begin{snippet}{q}
select dt:date+time,shortcode:{`$1#string x}each code,size:asksize,price:askprice from quotes
select dt:date+time,shortcode:{`$1#string x}each code,size:asksize,price:askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
dt                            shortcode size price
dt                            shortcode size price
---------------------------------------------------
---------------------------------------------------
Line 704: Line 663:
2016.04.08D15:20:02.000000000 C        44  40.79
2016.04.08D15:20:02.000000000 C        44  40.79
2016.04.08D15:21:43.786000000 C        28  40.77
2016.04.08D15:21:43.786000000 C        28  40.77
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
Instead of the keyword <tt>each</tt>, we could have used the <tt>'</tt> adverb (don't forget the parentheses around the verb&ndash;adverb pair):
\frametitle{A challenge~(iv)}
<pre>
Instead of the keyword \code{each}, we could have used the \code{'} adverb (don't forget the parentheses around the verb--adverb pair):
\begin{snippet}{q}
select dt:date+time,shortcode:({`$1#string x}')code,size:asksize,price:askprice from quotes
select dt:date+time,shortcode:({`$1#string x}')code,size:asksize,price:askprice from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
dt                            shortcode size price
dt                            shortcode size price
---------------------------------------------------
---------------------------------------------------
Line 728: Line 683:
2016.04.08D15:20:02.000000000 C        44  40.79
2016.04.08D15:20:02.000000000 C        44  40.79
2016.04.08D15:21:43.786000000 C        28  40.77
2016.04.08D15:21:43.786000000 C        28  40.77
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
\frametitle{A challenge~(v)}
This works since
This works since
\begin{snippet}{q}
<pre>
{`$1#string x}each quotes[`code]
{`$1#string x}each quotes[`code]
\end{snippet}
</pre>
and
and
\begin{snippet}{q}
<pre>
({`$1#string x}')quotes[`code]
({`$1#string x}')quotes[`code]
\end{snippet}
</pre>
give the desired list
give the desired list
\begin{snippet}{output}
<pre>
`C`G`C`C`G`C`G`C`C`C`C
`C`G`C`C`G`C`G`C`C`C`C
\end{snippet}
</pre>
which is what the above \code{select} returns for that column (remember: columns are lists).
which is what the above <tt>select</tt> returns for that column (remember: columns are lists).
\end{frame}
 
====Exercise====
 
Select from the <tt>quotes</tt> table <tt>cf</tt>, which is a symbol consisting of two characters: the first character of the <tt>code</tt> and the first character of the <tt>mktflag</tt>.


\begin{frame}
====Solution====
\frametitle{Exercise}
Select from the \code{quotes} table \code{cf}, which is a symbol consisting of two characters: the first character of the \code{code} and the first character of the \code{mktflag}.
\end{frame}


\begin{frame}[fragile]
We can form <tt>(code,mktflag)</tt> pairs with <tt>code,'mktflag</tt> and then apply a function to each (<tt>'</tt>) such pair:
\frametitle{Solution~(i)}
<pre>
\begin{itemize}
\item We can form \code{(code,mktflag)} pairs with \code{code,'mktflag} and then apply a function to each (\code{'}) such pair:
\begin{snippet}{q}
select cf:({`$(1#string x[0]),1#string x[1]}')(code,'mktflag) from quotes
select cf:({`$(1#string x[0]),1#string x[1]}')(code,'mktflag) from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
cf
cf
--
--
Line 774: Line 723:
CE
CE
CE
CE
\end{snippet}
</pre>
\item However, we can do without the `pairing' code \code{code,'mktflag}. While the keyword \code{each} does not work on functions of multiple arguments and the following throws \code{'type},
 
\begin{snippet}{q}
However, we can do without the "pairing" code <tt>code,'mktflag</tt>. While the keyword <tt>each</tt> does not work on functions of multiple arguments and the following throws <tt>'type</tt>,
<pre>
select cf:{`$(1#string x),1#string y}each[code;mktflag] from quotes
select cf:{`$(1#string x),1#string y}each[code;mktflag] from quotes
\end{snippet}
</pre>
\item Each (\code{'}) does work on functions of multiple arguments:
Each (<tt>'</tt>) does work on functions of multiple arguments:
\begin{snippet}{q}
<pre>
select cf:{`$(1#string x),1#string y}'[code;mktflag] from quotes
select cf:{`$(1#string x),1#string y}'[code;mktflag] from quotes
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
\frametitle{Solution~(ii)}
The following, somewhat less terse, solution works too. But here we iterate twice, whereas we could iterate only once, as in the previous solution:
The following, somewhat less terse, solution works too. But here we iterate twice, whereas we could iterate only once, as in the previous solution:
\begin{snippet}{q}
<pre>
select cf:`$(({1#string x}each code),'{1#string x}each mktflag) from quotes
select cf:`$(({1#string x}each code),'{1#string x}each mktflag) from quotes
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=Q-sql select: <tt>where</tt> specifications (constraints)=
\frametitle{q-sql select: \textbf{where} specifications (constraints)~(i)}
 
\begin{itemize}
As in classical SQL, in q-sql we can add to our <tt>select</tt> the <tt>where</tt> specifications (constraints). Thus, if we want to return only those columns where price is greater than <tt>1000f</tt>...
\item As in classical SQL, in q-sql we can add to our \code{select} the \code{where} specifications (constraints). Thus, if we want to return only those columns where price is greater than \code{1000f}...
<pre>
\begin{snippet}{q}
select date,time,price:askprice from quotes where price>1000f
select date,time,price:askprice from quotes where price>1000f
\end{snippet}
</pre>
\item Oh, wait, this returns an error:
Oh, wait, this returns an error:
\begin{snippet}{output}
<pre>
'price
'price
[0] select date,time,price:askprice from quotes where price>1000f
[0] select date,time,price:askprice from quotes where price>1000f
                                                       ^
                                                       ^
\end{snippet}
</pre>
\item That's because the constraints are applied first, before one of the resulting columns is given the name \code{price}. We must apply the constraint to \code{askprice} in the original table instead:
That's because the constraints are applied first, before one of the resulting columns is given the name <tt>price</tt>. We must apply the constraint to <tt>askprice</tt> in the original table instead:
\begin{snippet}{q}
<pre>
select date,time,price:askprice from quotes where askprice>1000f
select date,time,price:askprice from quotes where askprice>1000f
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date      time        price
date      time        price
------------------------------
------------------------------
Line 817: Line 761:
2016.04.07 13:00:00.205 1238.7
2016.04.07 13:00:00.205 1238.7
2016.04.07 15:20:02.224 1241
2016.04.07 15:20:02.224 1241
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
We can add multiple constraints separated by commas:
\frametitle{q-sql select: \textbf{where} specifications (constraints)~(ii)}
<pre>
\begin{itemize}
\item We can add multiple constraints separated by commas:
\begin{snippet}{q}
select date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
select date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date      time        price
date      time        price
------------------------------
------------------------------
2016.04.07 13:00:00.205 1238.7
2016.04.07 13:00:00.205 1238.7
2016.04.07 15:20:02.224 1241
2016.04.07 15:20:02.224 1241
\end{snippet}
</pre>
\item \nb{NB!} Unlike classical SQL, \textbf{the order of the constraints matters!} The constraints are applied in order. From the point of view of performance, you should apply the most `expensive' constraints, which shrink the result the most, first.
\item In particular, when dealing with partitioned tables, include the constraint on the partitioning column (usually \textbf{date}) first. Since partitioned tables are stored on disc in directories by date, this constraint allows kdb+/q to zoom in on the right directory. If it is not the first constraint, an expensive scan of the entire database on disc may result.
\end{itemize}
\end{frame}


\begin{frame}[fragile]
'''NB!''' Unlike classical SQL, '''the order of the constraints matters!''' The constraints are applied in order. From the point of view of performance, you should apply the most "expensive" constraints, which shrink the result the most, first.
\frametitle{q-sql select: indices}
 
Tables are stored in deterministic order. If we want to return the indices of the rows returned by the query, we can add \code{i} to the select:
In particular, when dealing with partitioned tables, include the constraint on the partitioning column (usually <tt>date</tt>) first. Since partitioned tables are stored on disc in directories by date, this constraint allows kdb+/q to zoom in on the right directory. If it is not the first constraint, an expensive scan of the entire database on disc may result.
\begin{snippet}{q}
 
=Q-sql select: indices=
 
Tables are stored in deterministic order. If we want to return the indices of the rows returned by the query, we can add <tt>i</tt> to the select:
<pre>
select i,date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
select i,date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
x date      time        price
x date      time        price
--------------------------------
--------------------------------
4 2016.04.07 13:00:00.205 1238.7
4 2016.04.07 13:00:00.205 1238.7
6 2016.04.07 15:20:02.224 1241
6 2016.04.07 15:20:02.224 1241
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
====Exercise====
\frametitle{Exercise}
Select from \code{trades} those rows where \code{price} is closer to the smallest integer greater than it than the greatest integer that is less than or equal to the \code{price}.
\end{frame}


\begin{frame}[fragile]
Select from <tt>trades</tt> those rows where <tt>price</tt> is closer to the smallest integer greater than it than the greatest integer that is less than or equal to the <tt>price</tt>.
\frametitle{Solution}
 
\begin{snippet}{q}
====Solution====
 
<pre>
select from trades where ((ceiling price)-price)<price-floor price
select from trades where ((ceiling price)-price)<price-floor price
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
code  date      time        price  volume mktflag comments                ..
code  date      time        price  volume mktflag comments                ..
-----------------------------------------------------------------------------..
-----------------------------------------------------------------------------..
Line 873: Line 810:
GCM16 2016.04.11 16:00:00.044 1257.9 1      E      ""                      ..
GCM16 2016.04.11 16:00:00.044 1257.9 1      E      ""                      ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E      ""                      ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E      ""                      ..
\end{snippet}
</pre>
The following solution is terser (and involves only a call to \code{floor}, no call to \code{ceiling}):
The following solution is terser (and involves only a call to <tt>floor</tt>, no call to <tt>ceiling</tt>):
\begin{snippet}{q}
<pre>
select from trades where (price-floor price)>0.5
select from trades where (price-floor price)>0.5
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=Q-sql select: more complex constraints=
\frametitle{q-sql select: more complex constraints}
 
\begin{itemize}
More complex constraints can be formed using boolean operations:
\item More complex constraints can be formed using boolean operations:
<pre>
\begin{snippet}{q}
select i,date,time,price:askprice from quotes where (((askprice>1000f)or(time>=12:00:00.000))and(time <= 15:30:00.000))
select i,date,time,price:askprice from quotes where (((askprice>1000f)or(time>=12:00:00.000))and(time <= 15:30:00.000))
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
x  date      time        price
x  date      time        price
---------------------------------
---------------------------------
Line 899: Line 834:
9  2016.04.08 15:20:02.000 40.79
9  2016.04.08 15:20:02.000 40.79
10 2016.04.08 15:21:43.786 40.77
10 2016.04.08 15:21:43.786 40.77
\end{snippet}
</pre>
\item Remember that constraints also operate on lists (columns) and result in boolean lists:
 
\begin{snippet}{q}
Remember that constraints also operate on lists (columns) and result in boolean lists:
<pre>
((quotes[`askprice]>1000f)or(quotes[`time]>=12:00:00.000))and(quotes[`time]<=15:30:00.000)
((quotes[`askprice]>1000f)or(quotes[`time]>=12:00:00.000))and(quotes[`time]<=15:30:00.000)
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
01111110111b
01111110111b
\end{snippet}
</pre>
\end{itemize}
 
\end{frame}
====Exercise====
 
Select from the <tt>quotes</tt> table those rows where the time is either less than 11 a.m. or greater than or equal to 3 p.m.


\begin{frame}[fragile]
====Solution====
\frametitle{Exercise}
Select from the \code{quotes} table those rows where the time is either less than 11 a.m. or greater than or equal to 3 p.m.
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{Solution}
\begin{snippet}{q}
select from quotes where (time<11:00:00.000)or time>=15:00:00.000
select from quotes where (time<11:00:00.000)or time>=15:00:00.000
\end{snippet}
<pre>
\begin{snippet}{output}
<pre>
code  date      time        bidprice bidsize askprice asksize mktflag
code  date      time        bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
-----------------------------------------------------------------------
Line 929: Line 862:
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:21:43.786 40.76    3      40.77    28      E
CLM16 2016.04.08 15:21:43.786 40.76    3      40.77    28      E
\end{snippet}
</pre>
Don't forget the parentheses. The query won't work correctly without them:
Don't forget the parentheses. The query won't work correctly without them:
\begin{snippet}{q}
<pre>
select from quotes where time<11:00:00.000 or time>=15:00:00.000
select from quotes where time<11:00:00.000 or time>=15:00:00.000
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
code  date      time        bidprice bidsize askprice asksize mktflag
code  date      time        bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3      E
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3      E
CLM16 2016.04.08 10:53:00.002 40.83    6      40.84    66      E
CLM16 2016.04.08 10:53:00.002 40.83    6      40.84    66      E
\end{snippet}
</pre>
 
(Which is not what we want.)
(Which is not what we want.)
\end{frame}


\begin{frame}[fragile]
=Q-sql select: group-by specifications (<tt>by</tt> phrase)=
\frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(i)}
 
\begin{itemize}
We can group the results by the value of a particular column:
\item We can group the results by the value of a particular column:
<pre>
\begin{snippet}{q}
select i,date,time,price:askprice by date from quotes where askprice>=40f
select i,date,time,price:askprice by date from quotes where askprice>=40f
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date      | x        date                                        time        ..
date      | x        date                                        time        ..
----------| -----------------------------------------------------------------..
----------| -----------------------------------------------------------------..
2016.04.07| 1 4 6    2016.04.07 2016.04.07 2016.04.07            11:02:16.663..
2016.04.07| 1 4 6    2016.04.07 2016.04.07 2016.04.07            11:02:16.663..
2016.04.08| 7 8 9 10 2016.04.08 2016.04.08 2016.04.08 2016.04.08 10:53:00.002..
2016.04.08| 7 8 9 10 2016.04.08 2016.04.08 2016.04.08 2016.04.08 10:53:00.002..
\end{snippet}
</pre>
\item The result is a keyed table (in this case, keyed by date).
 
\item Apart from the key column, all the columns in the result are lists of lists, collecting values for each partition:
The result is a keyed table (in this case, keyed by date).
\begin{snippet}{q}
 
Apart from the key column, all the columns in the result are lists of lists, collecting values for each partition:
<pre>
(select i,date,time,price:askprice by date from quotes where askprice>=40f)[2016.04.07][`time]
(select i,date,time,price:askprice by date from quotes where askprice>=40f)[2016.04.07][`time]
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
11:02:16.663 13:00:00.205 15:20:02.224
11:02:16.663 13:00:00.205 15:20:02.224
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
Notice that, if you omit the list of selected columns from a <tt>select</tt> with <tt>by</tt>, somewhat counterintuitively, we will, for every single column, get the last value in each bucket, instead of a list of all values in the bucket:
\frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(ii)}
<pre>
Notice that, if you omit the list of selected columns from a \code{select} with \code{by}, somewhat counterintuitively, we will, for every single column, get the last value in each bucket, instead of a list of all values in the bucket:
\begin{snippet}{q}
q)select by date from quotes where askprice>=40f
q)select by date from quotes where askprice>=40f
date      | code  time        bidprice bidsize askprice asksize mktflag
date      | code  time        bidprice bidsize askprice asksize mktflag
Line 976: Line 906:
2016.04.07| GCM16 15:20:02.224 1240.9  6      1241    1      E
2016.04.07| GCM16 15:20:02.224 1240.9  6      1241    1      E
2016.04.08| CLM16 15:21:43.786 40.76    3      40.77    28      E
2016.04.08| CLM16 15:21:43.786 40.76    3      40.77    28      E
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
\frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(iii)}
Group-by specifications make it easy to produce summary statistics:
Group-by specifications make it easy to produce summary statistics:
\begin{snippet}{q}
<pre>
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by date,code from quotes where askprice<1250f
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by date,code from quotes where askprice<1250f
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date      code | first_bidprice avg_askprice max_bidsize number_of_quotes
date      code | first_bidprice avg_askprice max_bidsize number_of_quotes
----------------| --------------------------------------------------------
----------------| --------------------------------------------------------
Line 991: Line 918:
2016.04.07 GCM16| 1241.4        1240.4      22          3
2016.04.07 GCM16| 1241.4        1240.4      22          3
2016.04.08 CLM16| 40.83          40.74        38          4
2016.04.08 CLM16| 40.83          40.74        38          4
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
We can also group by the result of a function, e.g.:
\frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(iv)}
<pre>
\begin{itemize}
\item We can also group by the result of a function, e.g.:
\begin{snippet}{q}
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by{[x] `hh$x}each time from quotes where askprice<1250f
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by{[x] `hh$x}each time from quotes where askprice<1250f
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
time| first_bidprice avg_askprice max_bidsize number_of_quotes
time| first_bidprice avg_askprice max_bidsize number_of_quotes
----| --------------------------------------------------------
----| --------------------------------------------------------
Line 1,009: Line 932:
13  | 1238.6        639.63      38          2
13  | 1238.6        639.63      38          2
15  | 38.52          340.2725    26          4
15  | 38.52          340.2725    26          4
\end{snippet}
</pre>
\item Although in this case the use of \code{each} is superfluous: \code{`hh$...} works just as well on lists as on scalars:
 
\begin{snippet}{q}
Although in this case the use of <tt>each</tt> is superfluous: <tt>`hh$...</tt> works just as well on lists as on scalars:
<pre>
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by`hh$time from quotes where askprice<1250f
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by`hh$time from quotes where askprice<1250f
\end{snippet}
</pre>
\end{itemize}
 
\end{frame}
====Exercise====


\begin{frame}[fragile]
Group the data from <tt>trades</tt> by the number of digits in the <tt>price</tt>.
\frametitle{Exercise}
Group the data from \code{trades} by the number of digits in the \code{price}.
\end{frame}


\begin{frame}[fragile]
====Solution====
\frametitle{Solution~(i)}
 
\begin{itemize}
We could do
\item We could do
<pre>
\begin{snippet}{q}
select code,date,time,price,volume,mktflag,comments by count each string price from trades
select code,date,time,price,volume,mktflag,comments by count each string price from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
price| code                                                        date    ..
price| code                                                        date    ..
-----| ----------------------------------------------------------------------..
-----| ----------------------------------------------------------------------..
Line 1,035: Line 955:
5    | `CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.0..
5    | `CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.0..
6    | `GCM16`GCM16`GCM16`GCM16`GCM16                              2016.04.0..
6    | `GCM16`GCM16`GCM16`GCM16`GCM16                              2016.04.0..
\end{snippet}
</pre>
    but this counts the decimal point as a digit (which it isn't).
but this counts the decimal point as a digit (which it isn't).
\item To remedy this, we could use
 
\begin{snippet}{q}
To remedy this, we could use
<pre>
select code,date,time,price,volume,mktflag,comments by {count sv[""]vs["."]x} each string price from trades
select code,date,time,price,volume,mktflag,comments by {count sv[""]vs["."]x} each string price from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
price| code                                                              dat..
price| code                                                              dat..
-----| ----------------------------------------------------------------------..
-----| ----------------------------------------------------------------------..
Line 1,047: Line 968:
4    | `CLM16`VXK16`GCM16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 201..
4    | `CLM16`VXK16`GCM16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 201..
5    | `GCM16`GCM16`GCM16`GCM16`GCM16                                    201..
5    | `GCM16`GCM16`GCM16`GCM16`GCM16                                    201..
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
\frametitle{Solution~(ii)}
If we care only about the digits before the decimal point, we could use
If we care only about the digits before the decimal point, we could use
\begin{snippet}{q}
<pre>
select code,date,time,price,volume,mktflag,comments by (1+floor log[price]%log 10) from trades
select code,date,time,price,volume,mktflag,comments by (1+floor log[price]%log 10) from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
x| code                                                              date  ..
x| code                                                              date  ..
-| --------------------------------------------------------------------------..
-| --------------------------------------------------------------------------..
2| `CLM16`CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04..
2| `CLM16`CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04..
4| `GCM16`GCM16`GCM16`GCM16`GCM16`GCM16                              2016.04..
4| `GCM16`GCM16`GCM16`GCM16`GCM16`GCM16                              2016.04..
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=<tt>xbar</tt>=
\frametitle{\textbf{xbar}}
 
\begin{itemize}
Consider the following example:
\item Consider the following example:
<pre>
\begin{snippet}{q}
q)5 xbar 23 21 56 39 10 9 81 20 45 101 15
q)5 xbar 23 21 56 39 10 9 81 20 45 101 15
20 20 55 35 10 5 80 20 45 100 15
20 20 55 35 10 5 80 20 45 100 15
\end{snippet}
</pre>
\item For each integer in the list on the right \code{xbar} outputs a multiple of 5 that is less than or equal to it.
For each integer in the list on the right <tt>xbar</tt> outputs a multiple of 5 that is less than or equal to it.
\item \code{xbar} works with floats, too:
<tt>xbar</tt> works with floats, too:
\begin{snippet}{q}
<pre>
q)5 xbar 23.7 21.9 56.3 39.2 10.1 9.8 81.3 20.4 45.4 101.0 15.3
q)5 xbar 23.7 21.9 56.3 39.2 10.1 9.8 81.3 20.4 45.4 101.0 15.3
20 20 55 35 10 5 80 20 45 100 15f
20 20 55 35 10 5 80 20 45 100 15f
\end{snippet}
</pre>
\item And, importantly, it works with temporal types:
And, importantly, it works with temporal types:
\begin{snippet}{q}
<pre>
q)(1000*60*5)xbar 00:03:09.022 00:05:58.921 00:21:00.000 00:29:00.000
q)(1000*60*5)xbar 00:03:09.022 00:05:58.921 00:21:00.000 00:29:00.000
00:00:00.000 00:05:00.000 00:20:00.000 00:25:00.000
00:00:00.000 00:05:00.000 00:20:00.000 00:25:00.000
\end{snippet}
</pre>
On the left we have the number of milliseconds\footnote{When working with timestamps we should use nanoseconds.}. Thus we have produced five-minute bars.
On the left we have the number of milliseconds. (When working with timestamps we should use nanoseconds.) Thus we have produced five-minute bars.
\item \code{xbar} is defined in k as follows:
 
\begin{snippet}{q}
<tt>xbar</tt> is defined in k as follows:
<pre>
q)xbar
q)xbar
k){x*y div x:$[16h=abs[@x];"j"$x;x]}
k){x*y div x:$[16h=abs[@x];"j"$x;x]}
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
=Q-sql select: <tt>xbar</tt> in group-by specifications (<tt>by</tt> phrase)=
\frametitle{q-sql select: \textbf{xbar} in group-by specifications (\textbf{by} phrase)}
 
\begin{itemize}
<tt>xbar</tt> is particularly useful in group-by specifications.
\item \code{xbar} is particularly useful in group-by specifications.
 
\item Thus, if we want to group-by 15-minute intervals, we can apply it like so:
Thus, if we want to group-by 15-minute intervals, we can apply it like so:
\begin{snippet}{q}
<pre>
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by(15 * 60 * 1000000000)xbar(quotes[`date]+quotes[`time]) from quotes where askprice<1250f
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by(15 * 60 * 1000000000)xbar(quotes[`date]+quotes[`time]) from quotes where askprice<1250f
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
x                            | first_bidprice avg_askprice max_bidsize number_of_quotes
x                            | first_bidprice avg_askprice max_bidsize number_of_quotes
-----------------------------|--------------------------------------------------------
-----------------------------|--------------------------------------------------------
Line 1,114: Line 1,028:
2016.04.08D13:45:00.000000000| 40.54          40.56        38          1
2016.04.08D13:45:00.000000000| 40.54          40.56        38          1
2016.04.08D15:15:00.000000000| 40.77          40.78        26          2
2016.04.08D15:15:00.000000000| 40.77          40.78        26          2
\end{snippet}
</pre>
\end{itemize}
 
\end{frame}
====Exercise====


\begin{frame}
\frametitle{Exercise}
Produce a table of last trade prices, last trade volumes, average trade prices, and average trade volumes in each one hour-long bar.
Produce a table of last trade prices, last trade volumes, average trade prices, and average trade volumes in each one hour-long bar.
\end{frame}


\begin{frame}[fragile]
====Solution====
\frametitle{Solution~(i)}
 
The question is whether we want to lump different dates into the same bucket. If we do, then we could use
The question is whether we want to lump different dates into the same bucket. If we do, then we could use
\begin{snippet}{q}
<pre>
select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000*60*60)xbar time from trades
select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000*60*60)xbar time from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
time        | lastp  lastv avgp    avgv
time        | lastp  lastv avgp    avgv
------------| -------------------------
------------| -------------------------
Line 1,139: Line 1,050:
15:00:00.000| 19.05  2    29.915  2.5
15:00:00.000| 19.05  2    29.915  2.5
16:00:00.000| 1258.7 1    1258.3  1
16:00:00.000| 1258.7 1    1258.3  1
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
More likely than not, we want to distinguish the different dates, so we apply <tt>xbar</tt> as follows:
\frametitle{Solution~(ii)}
<pre>
More likely than not, we want to distinguish the different dates, so we apply \code{xbar} as follows:
\begin{snippet}{q}
q)select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000000000*60*60)xbar date+time from trades
q)select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000000000*60*60)xbar date+time from trades
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date                        | lastp  lastv avgp    avgv
date                        | lastp  lastv avgp    avgv
-----------------------------| --------------------------
-----------------------------| --------------------------
Line 1,163: Line 1,071:
2016.04.11D15:00:00.000000000| 19.05  2    19.05    2
2016.04.11D15:00:00.000000000| 19.05  2    19.05    2
2016.04.11D16:00:00.000000000| 1258.7 1    1258.3  1
2016.04.11D16:00:00.000000000| 1258.7 1    1258.3  1
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=<tt>exec</tt>=
\frametitle{\textbf{exec}}
 
\begin{itemize}
<tt>exec</tt> behaves similarly to but differently from <tt>select</tt>. Unlike <tt>select</tt>, it does not return tables.
\item \code{exec} behaves similarly to but differently from \code{select}. Unlike \code{select}, it does not return tables.
 
\item The following
The following
\begin{snippet}{q}
<pre>
exec from quotes
exec from quotes
\end{snippet}
</pre>
will return the first row of the table as a dictionary:
will return the first row of the table as a dictionary:
\begin{snippet}{output}
<pre>
code    | `CLM16
code    | `CLM16
date    | 2016.04.08
date    | 2016.04.08
Line 1,184: Line 1,091:
asksize | 28h
asksize | 28h
mktflag | `E
mktflag | `E
\end{snippet}
</pre>
\item The following will return a dictionary of two lists:
 
\begin{snippet}{q}
The following will return a dictionary of two lists:
<pre>
exec date, time from quotes
exec date, time from quotes
\end{snippet}
</pre>
\begin{snippet}{output}
<pre>
date| 2016.04.07  2016.04.07  2016.04.07  2016.04.07  2016.04.07  2016.0..
date| 2016.04.07  2016.04.07  2016.04.07  2016.04.07  2016.04.07  2016.0..
time| 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:..
time| 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:..
\end{snippet}
</pre>
\item Whereas the following will simply return a list:
 
\begin{snippet}{q}
Whereas the following will simply return a list:
<pre>
exec date from quotes
exec date from quotes
\end{snippet}
</pre>
\end{itemize}
\end{frame}


\begin{frame}[fragile]
=<tt>update</tt>=
\frametitle{\textbf{update}}
 
\begin{itemize}
<tt>update</tt>, whose syntax is similar to that of <tt>select</tt>, can be used to update tables.
\item \code{update}, whose syntax is similar to that of \code{select}, can be used to update tables.
 
\item For example, to replace all \code{code}s with \code{`UNKNOWN} in the table \code{quotes}, we can use the following:
For example, to replace all <tt>code</tt>s with <tt>`UNKNOWN</tt> in the table <tt>quotes</tt>, we can use the following:
\begin{snippet}{q}
<pre>
update code:`UNKNOWN from quotes
update code:`UNKNOWN from quotes
\end{snippet}
</pre>
\item To replace the code \code{`CLM16} with \code{`crudeoil} we can use
 
\begin{snippet}{q}
To replace the code <tt>`CLM16</tt> with <tt>`crudeoil</tt> we can use
<pre>
update code:`crudeoil from trades where code=`CLM16
update code:`crudeoil from trades where code=`CLM16
\end{snippet}
</pre>
\item These q-sql statements will return new tables. To update the table in place, reference it by name (\code{`quotes}) instead of using the variable \code{quotes} in the query:
 
\begin{snippet}{q}
These q-sql statements will return new tables. To update the table in place, reference it by name (<tt>`quotes</tt>) instead of using the variable <tt>quotes</tt> in the query:
<pre>
q)update code:`crudeoil from `trades where code=`CLM16
q)update code:`crudeoil from `trades where code=`CLM16
`trades
`trades
\end{snippet}
</pre>
\end{itemize}
 
\end{frame}
====Exercise====


\begin{frame}[fragile]
Use <tt>update</tt> to set all instances of <tt>bidprice</tt> where <tt>bidprice</tt> is greater than <tt>1000f</tt> to <tt>0n</tt>.
\frametitle{Exercise}
Use \code{update} to set all instances of \code{bidprice} where \code{bidprice} is greater than \code{1000f} to \code{0n}.
\end{frame}


\begin{frame}[fragile]
====Solution====
\frametitle{Solution}
 
\begin{snippet}{q}
<pre>
q)update bidprice:0n from `quotes where bidprice>1000f
q)update bidprice:0n from `quotes where bidprice>1000f
`quotes
`quotes
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
====Exercise====
\frametitle{Exercise}
Use \code{update} to set all \code{bidprice}s in the table \code{quotes} to an increasing sequence of floats: \code{0f} \code{1f} \code{2f} \code{3f}...
\end{frame}


\begin{frame}[fragile]
Use <tt>update</tt> to set all <tt>bidprice</tt>s in the table <tt>quotes</tt> to an increasing sequence of floats: <tt>0f</tt> <tt>1f</tt> <tt>2f</tt> <tt>3f</tt>...
\frametitle{Solution}
 
\begin{snippet}{q}
====Solution====
 
<pre>
update bidprice:`float$til count quotes from `quotes
update bidprice:`float$til count quotes from `quotes
\end{snippet}
</pre>
or
or
\begin{snippet}{q}
<pre>
update bidprice:0f+til count quotes from `quotes
update bidprice:0f+til count quotes from `quotes
\end{snippet}
</pre>
or
or
\begin{snippet}{q}
<pre>
update bidprice:`float$i from `quotes
update bidprice:`float$i from `quotes
\end{snippet}
</pre>
or
or
\begin{snippet}{q}
<pre>
update bidprice:0f+i from `quotes
update bidprice:0f+i from `quotes
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=Column attributes=
\frametitle{Recall the meaning of the column attributes}
\begin{itemize}
\item \code{`s} --- \defn{sorted}: \code{`s#2 2 3}. The elements of the list must be sorted. By marking the list as sorted, binary search can be performed instead of fully scanning the column, speeding up many operations.
\item \code{`u} --- \defn{unique}: \code{`u#2 4 5}. The elements of the list must be unique. This attribute creates a unique hash table in the background, allowing constant time lookup of elements.
\item \code{`p} --- \defn{parted}: \code{`p#2 2 1}. All elements of the same value must occur in one sequential block. By storing the same items continuously on-disc reading using standard hard drives is much faster.
\item \code{`g} --- \defn{grouped}: \code{`g#2 1 2}. This attribute is closest to indexing in standard databases. A lookup table from each distinct value in the table is created to map to the positions where that value occurs. This enables much quicker lookup of the entries.
\end{itemize}
\end{frame}


\begin{frame}[fragile]
Recall the meaning of the column attributes:
\frametitle{Setting and unsetting the column attributes}
 
We can use \code{update} to set and unset the column attributes:
* <tt>`s</tt> &mdash; '''sorted''': <tt>`s#2 2 3</tt>. The elements of the list must be sorted. By marking the list as sorted, binary search can be performed instead of fully scanning the column, speeding up many operations.
\begin{snippet}{q}
* <tt>`u</tt> &mdash; '''unique''': <tt>`u#2 4 5</tt>. The elements of the list must be unique. This attribute creates a unique hash table in the background, allowing constant time lookup of elements.
* <tt>`p</tt> &mdash; '''parted''': <tt>`p#2 2 1</tt>. All elements of the same value must occur in one sequential block. By storing the same items continuously on-disc reading using standard hard drives is much faster.
* <tt>`g</tt> &mdash; '''grouped''': <tt>`g#2 1 2</tt>. This attribute is closest to indexing in standard databases. A lookup table from each distinct value in the table is created to map to the positions where that value occurs. This enables much quicker lookup of the entries.
 
We can use <tt>update</tt> to set and unset the column attributes:
<pre>
q)meta quotes
q)meta quotes
c      | t f a
c      | t f a
Line 1,282: Line 1,182:
asksize | h
asksize | h
mktflag | s
mktflag | s
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)update `s#date from `quotes
q)update `s#date from `quotes
`quotes
`quotes
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)meta quotes
q)meta quotes
c      | t f a
c      | t f a
Line 1,299: Line 1,199:
asksize | h
asksize | h
mktflag | s
mktflag | s
\end{snippet}
</pre>
\end{frame}
 
====Exercise====
 
Set the grouped attribute on the column <tt>code</tt> and the sorted attribute on the column <tt>date</tt> of the <tt>trades</tt> table.


\begin{frame}[fragile]
====Solution====
\frametitle{Exercise}
Set the grouped attribute on the column \code{code} and the sorted attribute on the column \code{date} of the \code{trades} table.
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{Solution}
\begin{snippet}{q}
q)meta trades
q)meta trades
c      | t f a
c      | t f a
Line 1,320: Line 1,218:
mktflag | s
mktflag | s
comments| C
comments| C
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)update `g#code,`s#date from `trades
q)update `g#code,`s#date from `trades
`trades
`trades
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)meta trades
q)meta trades
c      | t f a
c      | t f a
Line 1,336: Line 1,234:
mktflag | s
mktflag | s
comments| C
comments| C
\end{snippet}
</pre>
\end{frame}
 
====Exercise====
 
Remove the attributes from the <tt>code</tt> and <tt>date</tt> columns of the <tt>trades</tt> table.


\begin{frame}[fragile]
====Solution====
\frametitle{Exercise}
Remove the attributes from the \code{code} and \code{date} columns of the \code{trades} table.
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{Solution}
\begin{snippet}{q}
q)meta trades
q)meta trades
c      | t f a
c      | t f a
Line 1,357: Line 1,253:
mktflag | s
mktflag | s
comments| C
comments| C
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)update `#code,`#date from `trades
q)update `#code,`#date from `trades
`trades
`trades
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)meta trades
q)meta trades
c      | t f a
c      | t f a
Line 1,373: Line 1,269:
mktflag | s
mktflag | s
comments| C
comments| C
\end{snippet}
</pre>
\end{frame}
 
=<tt>delete</tt>=
 
<tt>delete</tt> also has a similar syntax to that of <tt>update</tt>.


\begin{frame}[fragile]
We can use the following to delete all quotes for <tt>`GCM16</tt>:
\frametitle{\textbf{delete}}
<pre>
\begin{itemize}
\item \code{delete} also has a similar syntax to that of \code{update}.
\item We can use the following to delete all quotes for \code{`GCM16}:
\begin{snippet}{q}
delete from `quotes where code=`GCM16
delete from `quotes where code=`GCM16
\end{snippet}
</pre>
\item Notice that we have again referred to the table by its name, \code{`quotes}, in order to change it in place.
 
\end{itemize}
Notice that we have again referred to the table by its name, <tt>`quotes</tt>, in order to change it in place.
\end{frame}
 
====Exercise====


\begin{frame}[fragile]
Delete all rows from <tt>quotes</tt> where the <tt>askprice</tt> is greater than <tt>1000f</tt>.
\frametitle{Exercise}
Delete all rows from \code{quotes} where the \code{askprice} is greater than \code{1000f}.
\end{frame}


\begin{frame}[fragile]
====Solution====
\frametitle{Solution}
 
\begin{snippet}{q}
<pre>
q)delete from `quotes where askprice>1000f
q)delete from `quotes where askprice>1000f
`quotes
`quotes
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
=Deleting columns=
\frametitle{Deleting columns}
 
\code{delete} can be used to delete columns as well as rows:
<tt>delete</tt> can be used to delete columns as well as rows:
\begin{snippet}{q}
<pre>
delete bidprice from quotes
delete bidprice from quotes
\end{snippet}
</pre>
This will return a copy. To modify the table in place, use the symbol reference \code{`quotes} instead of \code{quotes}.
This will return a copy. To modify the table in place, use the symbol reference <tt>`quotes</tt> instead of <tt>quotes</tt>.
\end{frame}


\begin{frame}
====Exercise====
\frametitle{Exercise}
Delete the columns \code{mktflag} and \code{comments} from the table \code{trades}.
\end{frame}


\begin{frame}[fragile]
Delete the columns <tt>mktflag</tt> and <tt>comments</tt> from the table <tt>trades</tt>.
\frametitle{Solution}
 
\begin{snippet}{q}
====Solution====
 
<pre>
q)delete mktflag,comments from `trades
q)delete mktflag,comments from `trades
`trades
`trades
\end{snippet}
</pre>
\end{frame}
 
=The order of evaluation=


\begin{frame}[fragile]
We could use <tt>0N!</tt> to investigate the order of evaluation in q:
\frametitle{The order of evaluation~(i)}
<pre>
We could use \code{0N!} to investigate the order of evaluation in q:
\begin{snippet}{q}
q)0N!1;0N!2;0N!3
q)0N!1;0N!2;0N!3
1
1
Line 1,432: Line 1,321:
3
3
3
3
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)(0N!1;0N!2;0N!3)
q)(0N!1;0N!2;0N!3)
3
3
Line 1,439: Line 1,328:
1
1
1 2 3
1 2 3
\end{snippet}
</pre>
\begin{snippet}{q}
<pre>
q)add:{x+y+z};add[0N!1;0N!2;0N!3]
q)add:{x+y+z};add[0N!1;0N!2;0N!3]
3
3
Line 1,446: Line 1,335:
1
1
6
6
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{The order of evaluation~(ii)}
\begin{snippet}{q}
q)select 0N!date,0N!time,0N!askprice from quotes
q)select 0N!date,0N!time,0N!askprice from quotes
2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 ..
2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 ..
Line 1,469: Line 1,355:
2016.04.08 15:20:02.000 40.79
2016.04.08 15:20:02.000 40.79
2016.04.08 15:21:43.786 40.77
2016.04.08 15:21:43.786 40.77
\end{snippet}
</pre>
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{The order of evaluation~(iii)}
\begin{snippet}{q}
q)select 0N!date,0N!time,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7
q)select 0N!date,0N!time,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7
00000001111b
00000001111b
Line 1,486: Line 1,369:
2016.04.08 15:20:02.000 40.79
2016.04.08 15:20:02.000 40.79
2016.04.08 15:21:43.786 40.77
2016.04.08 15:21:43.786 40.77
\end{snippet}
</pre>
 
From this it is clear that the order of evaluation is as follows:
From this it is clear that the order of evaluation is as follows:
\begin{itemize}
* <tt>date>2016.04.07</tt>
\item \code{date>2016.04.07}
* <tt>askprice>40.7</tt>
\item \code{askprice>40.7}
* <tt>date</tt>
\item \code{date}
* <tt>time</tt>
\item \code{time}
* <tt>askprice</tt>
\item \code{askprice}
\end{itemize}
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{The order of evaluation~(iv)}
\begin{snippet}{q}
q)select 0N!time,0N!askprice by 0N!date from quotes where 0N!date>2016.04.07,0N!askprice>40.7
q)select 0N!time,0N!askprice by 0N!date from quotes where 0N!date>2016.04.07,0N!askprice>40.7
00000001111b
00000001111b
Line 1,509: Line 1,388:
----------| --------------------------------------------------------
----------| --------------------------------------------------------
2016.04.08| 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77
2016.04.08| 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77
\end{snippet}
</pre>
 
From this it is clear that the order of evaluation is as follows:
From this it is clear that the order of evaluation is as follows:
\begin{itemize}
* <tt>date>2016.04.07</tt>
\item \code{date>2016.04.07}
* <tt>askprice>40.7</tt>
\item \code{askprice>40.7}
* <tt>by date</tt>
\item \code{by date}
* <tt>time</tt>
\item \code{time}
* <tt>askprice</tt>
\item \code{askprice}
\end{itemize}
\end{frame}


\begin{frame}[fragile]
<pre>
\frametitle{The order of evaluation~(v)}
\begin{snippet}{q}
q)select 0N!time by 0N!date,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7
q)select 0N!time by 0N!date,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7
00000001111b
00000001111b
Line 1,536: Line 1,411:
2016.04.08 40.79  | 15:20:02.000
2016.04.08 40.79  | 15:20:02.000
2016.04.08 40.84  | 10:53:00.002
2016.04.08 40.84  | 10:53:00.002
\end{snippet}
</pre>
 
From this it is clear that the order of evaluation is as follows:
From this it is clear that the order of evaluation is as follows:
\begin{itemize}
* <tt>date>2016.04.07</tt>
\item \code{date>2016.04.07}
* <tt>askprice>40.7</tt>
\item \code{askprice>40.7}
* <tt>by date</tt>
\item \code{by date}
* <tt>by askprice</tt>
\item \code{by askprice}
* <tt>time</tt>
\item \code{time}
\end{itemize}
\end{frame}

Latest revision as of 22:20, 21 December 2021

Example table schemas

Consider the following quotes table schema (empty table):

quotes:([]
    code:`symbol$();
    date:`date$();
    time:`time$();
    bidprice:`float$();
    bidsize:`short$();
    askprice:`float$();
    asksize:`short$();
    mktflag:`symbol$())

...and the following trades table schema (empty table):

trades:([]
    code:`symbol$();
    date:`date$();
    time:`time$();
    price:`float$();
    volume:`short$();
    mktflag:`symbol$();
    comments:())

Insert

Let us add the first row to the quotes table using an insert:

insert[`quotes;(`CLM16;2016.04.07;10:15:00.010;38.34;86h;38.35;3h;`E)]

Notice that when we want to modify a table, we usually provide a reference to it as a symbol, `quotes, instead of passing it as a variable, quotes.

The values of the new row are provided in order, as a list.

We can now examine the table quotes:

quotes
code  date       time         bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3       E

We can add the second row in a similar manner:

insert[`quotes;(`GCM16;2016.04.07;11:02:16.663;1241.4;22h;1241.5;1h;`E)]

Let us again examine the table quotes:

quotes
code  date       time         bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3       E
GCM16 2016.04.07 11:02:16.663 1241.4   22      1241.5   1       E

Inserting multiple rows at once

One can insert multiple rows to the quotes table using the followning syntax:

.[`quotes;();,;(
    (`CLM16; 2016.04.07; 12:05:00.303; 38.12 ; 7h ; 38.13 ; 13h; `E);
    (`CLM16; 2016.04.07; 12:22:00.486; 38.11 ; 16h; 38.12 ; 8h ; `E);
    (`GCM16; 2016.04.07; 13:00:00.205; 1238.6; 8h ; 1238.7; 7h ; `E);
    (`CLM16; 2016.04.07; 15:00:00.051; 38.52 ; 9h ; 38.53 ; 18h; `E);
    (`GCM16; 2016.04.07; 15:20:02.224; 1240.9; 6h ; 1241f ; 1h ; `E);
    (`CLM16; 2016.04.08; 10:53:00.002; 40.83 ; 6h ; 40.84 ; 66h; `E);
    (`CLM16; 2016.04.08; 13:56:30.070; 40.54 ; 38h; 40.56 ; 58h; `E);
    (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E);
    (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]

Let us again examine the table quotes:

quotes
code  date       time         bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3       E
GCM16 2016.04.07 11:02:16.663 1241.4   22      1241.5   1       E
CLM16 2016.04.07 12:05:00.303 38.12    7       38.13    13      E
CLM16 2016.04.07 12:22:00.486 38.11    16      38.12    8       E
GCM16 2016.04.07 13:00:00.205 1238.6   8       1238.7   7       E
CLM16 2016.04.07 15:00:00.051 38.52    9       38.53    18      E
GCM16 2016.04.07 15:20:02.224 1240.9   6       1241     1       E
CLM16 2016.04.08 10:53:00.002 40.83    6       40.84    66      E
CLM16 2016.04.08 13:56:30.070 40.54    38      40.56    58      E
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:21:43.786 40.76    3       40.77    28      E

We have just used dot (.) for dyadic functions. Quoting Q for Mortals, the general form of functional . for dyadic functions is

.[L;I;f;y]

L is a mapping with explicit domain, I is a list in the domain of L, f is a dyadic function and y is an atom or list of the proper shape. For a list, the result is the item-wise application to the items of L indexed at depth by I, of f and the parameter y. Over the subdomain I, the map output becomes

(L . I) f y    / binary operator
f[L . I; y]    / dyadic function

Thus, if

L:(100 200;300 400 500 600;600 700);

then

.[L;1 2;+;42]

amounts to copying L, adding 42 to the [1;2]nd element of the copy, and returning that copy:

(100j, 200j;300j, 400j, 542j, 600j;600j 700j)

Similarly,

.[L;(1;(0;2;3));+;42]

amounts to copying L, adding 42 to the [1;0]th, [1;2]nd, and [1;3]rd elements of the copy, and returning that copy:

(100j, 200j;342j, 400j, 542j, 642j;600j, 700j)

Whereas

.[L;(1;(0;2;3));+;(42;45;44)]

amounts to copying L, adding 42 to the [1; 0]th, 45 to the [1; 2]nd, and 44 to the [1; 3]rd element of the copy, and returning that copy:

(100j, 200j;342j, 400j, 545j, 644j;600j, 700j)

And

.[L;();+;42]

amounts to copying L, adding 42 to all elements of the copy, and returning that copy:

(142j, 242j;342j, 442j, 542j, 642j;642j, 742j)

In none of the cases was the original L modified. If we wanted to modify it, we would have passed a name reference, as a symbol, instead:

.[`L;();+;42]
`L
L
<pre>
<pre>
(142j, 242j;342j, 442j, 542j, 642j;642j, 742j)

Along the same lines, we can use . to extend a list (remember that , concatenates):

scientists:`newton`leibniz;
.[`scientists;();,;`wiener`khinchin`kolmogorov];
scientists
`newton`leibniz`wiener`khinchin`kolmogorov

Just as we used to extend a table.

Deleting rows from a table

To delete all rows from the quotes table we can use the following code:

q)delete from `quotes

Notice that we referred to the quotes table by its symbol name,

`quotes

, instead of providing a variable, quotes. This is generally the case when we want to modify a table in place.

We can check that the quotes table is now indeed empty:

q)count quotes
0
q)quotes
code date time bidprice bidsize askprice asksize mktflag
--------------------------------------------------------

Repopulating quotes

Let us repopulate quotes again:

quotes:([]code:`symbol$();date:`date$();time:`time$();bidprice:`float$();bidsize:`short$();askprice:`float$();asksize:`short$();mktflag:`symbol$());
.[`quotes;();,;(
    (`CLM16; 2016.04.07; 10:15:00.010; 38.34 ; 86h; 38.35 ; 3h ; `E);
    (`GCM16; 2016.04.07; 11:02:16.663; 1241.4; 22h; 1241.5; 1h ; `E);
    (`CLM16; 2016.04.07; 12:05:00.303; 38.12 ; 7h ; 38.13 ; 13h; `E);
    (`CLM16; 2016.04.07; 12:22:00.486; 38.11 ; 16h; 38.12 ; 8h ; `E);
    (`GCM16; 2016.04.07; 13:00:00.205; 1238.6; 8h ; 1238.7; 7h ; `E);
    (`CLM16; 2016.04.07; 15:00:00.051; 38.52 ; 9h ; 38.53 ; 18h; `E);
    (`GCM16; 2016.04.07; 15:20:02.224; 1240.9; 6h ; 1241f ; 1h ; `E);
    (`CLM16; 2016.04.08; 10:53:00.002; 40.83 ; 6h ; 40.84 ; 66h; `E);
    (`CLM16; 2016.04.08; 13:56:30.070; 40.54 ; 38h; 40.56 ; 58h; `E);
    (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E);
    (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]

The metadata of the trades table before we populate it

Let us examine the metadata of the trades table before we populate it:

meta trades
c       | t f a
--------| -----
code    | s
date    | d
time    | t
price   | f
volume  | h
mktflag | s
comments|

Note that

type trades[`comments]

is 0h, so this column is a mixed list (remember that table columns are all lists). It appeared in the schema without a type cast:

trades : ([]
    ...
    volume:`short$();
    mktflag:`symbol$();
    comments:())

Exercise

Use dot (.) to populate the trades table with the following rows:

(`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
(`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; "");
(`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; "");
(`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade");
(`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; "");
(`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; "");
(`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; "");
(`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
(`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade");
(`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class");
(`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
(`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade");
(`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade");
(`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; "");
(`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; "");
(`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
(`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; "")

Solution

.[`trades;();,;(
    (`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
    (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; "");
    (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; "");
    (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; "");
    (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; "");
    (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; "");
    (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
    (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class");
    (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
    (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade");
    (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; "");
    (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; "");
    (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
    (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]
trades
code  date       time         price  volume mktflag comments                 ..
-----------------------------------------------------------------------------..
CLM16 2016.04.07 10:20:00.329 38.3   4      E       "NYMEX/CME Group trade"  ..
GCM16 2016.04.07 12:00:00.055 1239.7 6      E       ""                       ..
GCM16 2016.04.07 12:37:02.106 1240.5 1      E       ""                       ..
CLM16 2016.04.07 13:00:00.128 38.04  3      E       "NYMEX/CME Group trade"  ..
VXK16 2016.04.07 13:22:05.617 18.85  5      E       ""                       ..
GCM16 2016.04.07 14:35:01.241 1241.2 1      E       ""                       ..
GCM16 2016.04.08 10:13:01.048 1240   3      E       ""                       ..
VXK16 2016.04.08 11:34:53.417 18.53  1      E       "Transaction represents a..
CLM16 2016.04.08 12:00:00.227 40.61  3      E       "NYMEX/CME Group trade"  ..
VXK16 2016.04.08 12:44:00.684 18.44  2      E       "Transaction represents a..
VXK16 2016.04.08 12:45:33.130 18.49  1      E       "Transaction represents a..
CLM16 2016.04.08 15:20:02.000 40.78  3      E       "NYMEX/CME Group trade"  ..
CLM16 2016.04.11 11:00:00.105 41.43  2      E       "NYMEX/CME Group trade"  ..
VXK16 2016.04.11 14:00:00.829 18.35  1      E       ""                       ..
VXK16 2016.04.11 15:14:58.775 19.05  2      E       ""                       ..
GCM16 2016.04.11 16:00:00.044 1257.9 1      E       ""                       ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E       ""                       ..

The metadata of the trades table after we have populated it

Let us examine the metadata of the trades table before we populate it:

meta trades
c       | t f a
--------| -----
code    | s
date    | d
time    | t
price   | f
volume  | h
mktflag | s
comments| C

Note that

type trades[`comments]

is still 0h, but the metadata above tells us that comments is a column of strings. Strings are themselves lists

trades[`comments][0][0]
"N"

and the only way to declare a list of lists is through a mixed list, there is no other way we could have declared this column in our schema.

Q-sql select

Consider the simplest possible form of q-sql select:

select from quotes
code  date       time         bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3       E
GCM16 2016.04.07 11:02:16.663 1241.4   22      1241.5   1       E
CLM16 2016.04.07 12:05:00.303 38.12    7       38.13    13      E
CLM16 2016.04.07 12:22:00.486 38.11    16      38.12    8       E
GCM16 2016.04.07 13:00:00.205 1238.6   8       1238.7   7       E
CLM16 2016.04.07 15:00:00.051 38.52    9       38.53    18      E
GCM16 2016.04.07 15:20:02.224 1240.9   6       1241     1       E
CLM16 2016.04.08 10:53:00.002 40.83    6       40.84    66      E
CLM16 2016.04.08 13:56:30.070 40.54    38      40.56    58      E
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:21:43.786 40.76    3       40.77    28      E

We have selected all rows from the table.

Exercise

Select all rows from the trades table.

Solution

select from trades
code  date       time         price  volume mktflag comments                 ..
-----------------------------------------------------------------------------..
CLM16 2016.04.07 10:20:00.329 38.3   4      E       "NYMEX/CME Group trade"  ..
GCM16 2016.04.07 12:00:00.055 1239.7 6      E       ""                       ..
GCM16 2016.04.07 12:37:02.106 1240.5 1      E       ""                       ..
CLM16 2016.04.07 13:00:00.128 38.04  3      E       "NYMEX/CME Group trade"  ..
VXK16 2016.04.07 13:22:05.617 18.85  5      E       ""                       ..
GCM16 2016.04.07 14:35:01.241 1241.2 1      E       ""                       ..
GCM16 2016.04.08 10:13:01.048 1240   3      E       ""                       ..
VXK16 2016.04.08 11:34:53.417 18.53  1      E       "Transaction represents a..
CLM16 2016.04.08 12:00:00.227 40.61  3      E       "NYMEX/CME Group trade"  ..
VXK16 2016.04.08 12:44:00.684 18.44  2      E       "Transaction represents a..
VXK16 2016.04.08 12:45:33.130 18.49  1      E       "Transaction represents a..
CLM16 2016.04.08 15:20:02.000 40.78  3      E       "NYMEX/CME Group trade"  ..
CLM16 2016.04.11 11:00:00.105 41.43  2      E       "NYMEX/CME Group trade"  ..
VXK16 2016.04.11 14:00:00.829 18.35  1      E       ""                       ..
VXK16 2016.04.11 15:14:58.775 19.05  2      E       ""                       ..
GCM16 2016.04.11 16:00:00.044 1257.9 1      E       ""                       ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E       ""                       ..

Q-sql select: selecting columns

We may list the columns that we want to select, in the order in which we want them to appear in the resulting table:

select date,time,code,asksize,askprice from quotes
date       time         code  asksize askprice
----------------------------------------------
2016.04.07 10:15:00.010 CLM16 3       38.35
2016.04.07 11:02:16.663 GCM16 1       1241.5
2016.04.07 12:05:00.303 CLM16 13      38.13
2016.04.07 12:22:00.486 CLM16 8       38.12
2016.04.07 13:00:00.205 GCM16 7       1238.7
2016.04.07 15:00:00.051 CLM16 18      38.53
2016.04.07 15:20:02.224 GCM16 1       1241
2016.04.08 10:53:00.002 CLM16 66      40.84
2016.04.08 13:56:30.070 CLM16 58      40.56
2016.04.08 15:20:02.000 CLM16 44      40.79
2016.04.08 15:21:43.786 CLM16 28      40.77

By default, the output is not saved, but we can set it to a variable:

new_quotes_table:select date,time,code,asksize,askprice from quotes

Exercise

Select the date, time, volume, and price (in that order) from the trades table.

Solution

select date,time,volume,price from trades
date       time         volume price
-------------------------------------
2016.04.07 10:20:00.329 4      38.3
2016.04.07 12:00:00.055 6      1239.7
2016.04.07 12:37:02.106 1      1240.5
2016.04.07 13:00:00.128 3      38.04
2016.04.07 13:22:05.617 5      18.85
2016.04.07 14:35:01.241 1      1241.2
2016.04.08 10:13:01.048 3      1240
2016.04.08 11:34:53.417 1      18.53
2016.04.08 12:00:00.227 3      40.61
2016.04.08 12:44:00.684 2      18.44
2016.04.08 12:45:33.130 1      18.49
2016.04.08 15:20:02.000 3      40.78
2016.04.11 11:00:00.105 2      41.43
2016.04.11 14:00:00.829 1      18.35
2016.04.11 15:14:58.775 2      19.05
2016.04.11 16:00:00.044 1      1257.9
2016.04.11 16:28:34.311 1      1258.7

Q-sql select: selecting columns under new names

We may give the selected columns new names in the result:

select d:date,t:time,code,size:asksize,price:askprice from quotes
d          t            code  size price
-----------------------------------------
2016.04.07 10:15:00.010 CLM16 3    38.35
2016.04.07 11:02:16.663 GCM16 1    1241.5
2016.04.07 12:05:00.303 CLM16 13   38.13
2016.04.07 12:22:00.486 CLM16 8    38.12
2016.04.07 13:00:00.205 GCM16 7    1238.7
2016.04.07 15:00:00.051 CLM16 18   38.53
2016.04.07 15:20:02.224 GCM16 1    1241
2016.04.08 10:53:00.002 CLM16 66   40.84
2016.04.08 13:56:30.070 CLM16 58   40.56
2016.04.08 15:20:02.000 CLM16 44   40.79
2016.04.08 15:21:43.786 CLM16 28   40.77

Exercise

Select the date, time, volume, and price (in that order) from the trades table but under the names d, t, v, and p.

Solution

select d:date,t:time,v:volume,p:price from trades
d          t            v p
--------------------------------
2016.04.07 10:20:00.329 4 38.3
2016.04.07 12:00:00.055 6 1239.7
2016.04.07 12:37:02.106 1 1240.5
2016.04.07 13:00:00.128 3 38.04
2016.04.07 13:22:05.617 5 18.85
2016.04.07 14:35:01.241 1 1241.2
2016.04.08 10:13:01.048 3 1240
2016.04.08 11:34:53.417 1 18.53
2016.04.08 12:00:00.227 3 40.61
2016.04.08 12:44:00.684 2 18.44
2016.04.08 12:45:33.130 1 18.49
2016.04.08 15:20:02.000 3 40.78
2016.04.11 11:00:00.105 2 41.43
2016.04.11 14:00:00.829 1 18.35
2016.04.11 15:14:58.775 2 19.05
2016.04.11 16:00:00.044 1 1257.9
2016.04.11 16:28:34.311 1 1258.7

Q-sql select: operations on columns

Perhaps more interestingly, selected columns may be functions of / results of operations on the existing ones:

select dt:date+time,code,size:asksize,price:askprice from quotes
dt                            code  size price
-----------------------------------------------
2016.04.07D10:15:00.010000000 CLM16 3    38.35
2016.04.07D11:02:16.663000000 GCM16 1    1241.5
2016.04.07D12:05:00.303000000 CLM16 13   38.13
2016.04.07D12:22:00.486000000 CLM16 8    38.12
2016.04.07D13:00:00.205000000 GCM16 7    1238.7
2016.04.07D15:00:00.051000000 CLM16 18   38.53
2016.04.07D15:20:02.224000000 GCM16 1    1241
2016.04.08D10:53:00.002000000 CLM16 66   40.84
2016.04.08D13:56:30.070000000 CLM16 58   40.56
2016.04.08D15:20:02.000000000 CLM16 44   40.79
2016.04.08D15:21:43.786000000 CLM16 28   40.77

Remember that columns are lists, so the above + works on two lists:

q)quotes[`date]
2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 ..
q)quotes[`time]
10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:00.051..
q)quotes[`date]+quotes[`time]
2016.04.07D10:15:00.010000000 2016.04.07D11:02:16.663000000 2016.04.07D12:05:..

Exercise

The best bid and best ask (offer) prices are denoted by and , respectively.

The midprice is the average of the best bid and best ask prices:

Write a q-sql select query selecting the timestamp (date + time) and the midprice.

Solution

select dt:date+time,mid:.5*bidprice+askprice from quotes
dt                            mid
-------------------------------------
2016.04.07D10:15:00.010000000 38.345
2016.04.07D11:02:16.663000000 1241.45
2016.04.07D12:05:00.303000000 38.125
2016.04.07D12:22:00.486000000 38.115
2016.04.07D13:00:00.205000000 1238.65
2016.04.07D15:00:00.051000000 38.525
2016.04.07D15:20:02.224000000 1240.95
2016.04.08D10:53:00.002000000 40.835
2016.04.08D13:56:30.070000000 40.55
2016.04.08D15:20:02.000000000 40.78
2016.04.08D15:21:43.786000000 40.765

Exercise

The microprice is often more useful than the midprice, since it weights the best bid and best ask prices with the volumes posted at the best bid and best ask prices:

where and represent the volumes posted at the best bid and best ask prices, respectively.

Modify your query to add the microprice to it.

Solution

select dt:date+time,mid:.5*bidprice+askprice,microprice:((bidsize*askprice)+asksize*bidprice)%bidsize+asksize from quotes
dt                            mid     microprice
------------------------------------------------
2016.04.07D10:15:00.010000000 38.345  38.34966
2016.04.07D11:02:16.663000000 1241.45 1241.496
2016.04.07D12:05:00.303000000 38.125  38.1235
2016.04.07D12:22:00.486000000 38.115  38.11667
2016.04.07D13:00:00.205000000 1238.65 1238.653
2016.04.07D15:00:00.051000000 38.525  38.52333
2016.04.07D15:20:02.224000000 1240.95 1240.986
2016.04.08D10:53:00.002000000 40.835  40.83083
2016.04.08D13:56:30.070000000 40.55   40.54792
2016.04.08D15:20:02.000000000 40.78   40.77743
2016.04.08D15:21:43.786000000 40.765  40.76097

A challenge

So what if we want to return only the first character from each code?

select dt:date+time,shortcode:code[0],size:asksize,price:askprice from quotes

does not give the desired result:

dt                            shortcode size price
---------------------------------------------------
2016.04.07D10:15:00.010000000 CLM16     3    38.35
2016.04.07D11:02:16.663000000 CLM16     1    1241.5
2016.04.07D12:05:00.303000000 CLM16     13   38.13
2016.04.07D12:22:00.486000000 CLM16     8    38.12
2016.04.07D13:00:00.205000000 CLM16     7    1238.7
2016.04.07D15:00:00.051000000 CLM16     18   38.53
2016.04.07D15:20:02.224000000 CLM16     1    1241
2016.04.08D10:53:00.002000000 CLM16     66   40.84
2016.04.08D13:56:30.070000000 CLM16     58   40.56
2016.04.08D15:20:02.000000000 CLM16     44   40.79
2016.04.08D15:21:43.786000000 CLM16     28   40.77

This is because the indexing, ...[0], is applied to the entire list quotes[`code], and we get a symbol scalar, `CLM16. This scalar is returned for each row in the result, as you can see above.

Indeed, this is what happens when we include scalars among selected columns:

select dt:date+time,shortcode:123,size:asksize,price:askprice from quotes
dt                            shortcode size price
---------------------------------------------------
2016.04.07D10:15:00.010000000 123       3    38.35
2016.04.07D11:02:16.663000000 123       1    1241.5
2016.04.07D12:05:00.303000000 123       13   38.13
2016.04.07D12:22:00.486000000 123       8    38.12
2016.04.07D13:00:00.205000000 123       7    1238.7
2016.04.07D15:00:00.051000000 123       18   38.53
2016.04.07D15:20:02.224000000 123       1    1241
2016.04.08D10:53:00.002000000 123       66   40.84
2016.04.08D13:56:30.070000000 123       58   40.56
2016.04.08D15:20:02.000000000 123       44   40.79
2016.04.08D15:21:43.786000000 123       28   40.77

Still, what if we want to return the first character of each code?

One way to do this is by applying a function to each element of the list which represents the resulting column:

select dt:date+time,shortcode:{`$1#string x}each code,size:asksize,price:askprice from quotes
dt                            shortcode size price
---------------------------------------------------
2016.04.07D10:15:00.010000000 C         3    38.35
2016.04.07D11:02:16.663000000 G         1    1241.5
2016.04.07D12:05:00.303000000 C         13   38.13
2016.04.07D12:22:00.486000000 C         8    38.12
2016.04.07D13:00:00.205000000 G         7    1238.7
2016.04.07D15:00:00.051000000 C         18   38.53
2016.04.07D15:20:02.224000000 G         1    1241
2016.04.08D10:53:00.002000000 C         66   40.84
2016.04.08D13:56:30.070000000 C         58   40.56
2016.04.08D15:20:02.000000000 C         44   40.79
2016.04.08D15:21:43.786000000 C         28   40.77

Instead of the keyword each, we could have used the ' adverb (don't forget the parentheses around the verb–adverb pair):

select dt:date+time,shortcode:({`$1#string x}')code,size:asksize,price:askprice from quotes
dt                            shortcode size price
---------------------------------------------------
2016.04.07D10:15:00.010000000 C         3    38.35
2016.04.07D11:02:16.663000000 G         1    1241.5
2016.04.07D12:05:00.303000000 C         13   38.13
2016.04.07D12:22:00.486000000 C         8    38.12
2016.04.07D13:00:00.205000000 G         7    1238.7
2016.04.07D15:00:00.051000000 C         18   38.53
2016.04.07D15:20:02.224000000 G         1    1241
2016.04.08D10:53:00.002000000 C         66   40.84
2016.04.08D13:56:30.070000000 C         58   40.56
2016.04.08D15:20:02.000000000 C         44   40.79
2016.04.08D15:21:43.786000000 C         28   40.77

This works since

{`$1#string x}each quotes[`code]

and

({`$1#string x}')quotes[`code]

give the desired list

`C`G`C`C`G`C`G`C`C`C`C

which is what the above select returns for that column (remember: columns are lists).

Exercise

Select from the quotes table cf, which is a symbol consisting of two characters: the first character of the code and the first character of the mktflag.

Solution

We can form (code,mktflag) pairs with code,'mktflag and then apply a function to each (') such pair:

select cf:({`$(1#string x[0]),1#string x[1]}')(code,'mktflag) from quotes
cf
--
CE
GE
CE
CE
GE
CE
GE
CE
CE
CE
CE

However, we can do without the "pairing" code code,'mktflag. While the keyword each does not work on functions of multiple arguments and the following throws 'type,

select cf:{`$(1#string x),1#string y}each[code;mktflag] from quotes

Each (') does work on functions of multiple arguments:

select cf:{`$(1#string x),1#string y}'[code;mktflag] from quotes

The following, somewhat less terse, solution works too. But here we iterate twice, whereas we could iterate only once, as in the previous solution:

select cf:`$(({1#string x}each code),'{1#string x}each mktflag) from quotes

Q-sql select: where specifications (constraints)

As in classical SQL, in q-sql we can add to our select the where specifications (constraints). Thus, if we want to return only those columns where price is greater than 1000f...

select date,time,price:askprice from quotes where price>1000f

Oh, wait, this returns an error:

'price
[0] select date,time,price:askprice from quotes where price>1000f
                                                      ^

That's because the constraints are applied first, before one of the resulting columns is given the name price. We must apply the constraint to askprice in the original table instead:

select date,time,price:askprice from quotes where askprice>1000f
date       time         price
------------------------------
2016.04.07 11:02:16.663 1241.5
2016.04.07 13:00:00.205 1238.7
2016.04.07 15:20:02.224 1241

We can add multiple constraints separated by commas:

select date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
date       time         price
------------------------------
2016.04.07 13:00:00.205 1238.7
2016.04.07 15:20:02.224 1241

NB! Unlike classical SQL, the order of the constraints matters! The constraints are applied in order. From the point of view of performance, you should apply the most "expensive" constraints, which shrink the result the most, first.

In particular, when dealing with partitioned tables, include the constraint on the partitioning column (usually date) first. Since partitioned tables are stored on disc in directories by date, this constraint allows kdb+/q to zoom in on the right directory. If it is not the first constraint, an expensive scan of the entire database on disc may result.

Q-sql select: indices

Tables are stored in deterministic order. If we want to return the indices of the rows returned by the query, we can add i to the select:

select i,date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
x date       time         price
--------------------------------
4 2016.04.07 13:00:00.205 1238.7
6 2016.04.07 15:20:02.224 1241

Exercise

Select from trades those rows where price is closer to the smallest integer greater than it than the greatest integer that is less than or equal to the price.

Solution

select from trades where ((ceiling price)-price)<price-floor price
code  date       time         price  volume mktflag comments                 ..
-----------------------------------------------------------------------------..
GCM16 2016.04.07 12:00:00.055 1239.7 6      E       ""                       ..
VXK16 2016.04.07 13:22:05.617 18.85  5      E       ""                       ..
VXK16 2016.04.08 11:34:53.417 18.53  1      E       "Transaction represents a..
CLM16 2016.04.08 12:00:00.227 40.61  3      E       "NYMEX/CME Group trade"  ..
CLM16 2016.04.08 15:20:02.000 40.78  3      E       "NYMEX/CME Group trade"  ..
GCM16 2016.04.11 16:00:00.044 1257.9 1      E       ""                       ..
GCM16 2016.04.11 16:28:34.311 1258.7 1      E       ""                       ..

The following solution is terser (and involves only a call to floor, no call to ceiling):

select from trades where (price-floor price)>0.5

Q-sql select: more complex constraints

More complex constraints can be formed using boolean operations:

select i,date,time,price:askprice from quotes where (((askprice>1000f)or(time>=12:00:00.000))and(time <= 15:30:00.000))
x  date       time         price
---------------------------------
1  2016.04.07 11:02:16.663 1241.5
2  2016.04.07 12:05:00.303 38.13
3  2016.04.07 12:22:00.486 38.12
4  2016.04.07 13:00:00.205 1238.7
5  2016.04.07 15:00:00.051 38.53
6  2016.04.07 15:20:02.224 1241
8  2016.04.08 13:56:30.070 40.56
9  2016.04.08 15:20:02.000 40.79
10 2016.04.08 15:21:43.786 40.77

Remember that constraints also operate on lists (columns) and result in boolean lists:

((quotes[`askprice]>1000f)or(quotes[`time]>=12:00:00.000))and(quotes[`time]<=15:30:00.000)
01111110111b

Exercise

Select from the quotes table those rows where the time is either less than 11 a.m. or greater than or equal to 3 p.m.

Solution

select from quotes where (time<11:00:00.000)or time>=15:00:00.000
<pre>
<pre>
code  date       time         bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3       E
CLM16 2016.04.07 15:00:00.051 38.52    9       38.53    18      E
GCM16 2016.04.07 15:20:02.224 1240.9   6       1241     1       E
CLM16 2016.04.08 10:53:00.002 40.83    6       40.84    66      E
CLM16 2016.04.08 15:20:02.000 40.77    26      40.79    44      E
CLM16 2016.04.08 15:21:43.786 40.76    3       40.77    28      E

Don't forget the parentheses. The query won't work correctly without them:

select from quotes where time<11:00:00.000 or time>=15:00:00.000
code  date       time         bidprice bidsize askprice asksize mktflag
-----------------------------------------------------------------------
CLM16 2016.04.07 10:15:00.010 38.34    86      38.35    3       E
CLM16 2016.04.08 10:53:00.002 40.83    6       40.84    66      E

(Which is not what we want.)

Q-sql select: group-by specifications (by phrase)

We can group the results by the value of a particular column:

select i,date,time,price:askprice by date from quotes where askprice>=40f
date      | x        date                                        time        ..
----------| -----------------------------------------------------------------..
2016.04.07| 1 4 6    2016.04.07 2016.04.07 2016.04.07            11:02:16.663..
2016.04.08| 7 8 9 10 2016.04.08 2016.04.08 2016.04.08 2016.04.08 10:53:00.002..

The result is a keyed table (in this case, keyed by date).

Apart from the key column, all the columns in the result are lists of lists, collecting values for each partition:

(select i,date,time,price:askprice by date from quotes where askprice>=40f)[2016.04.07][`time]
11:02:16.663 13:00:00.205 15:20:02.224

Notice that, if you omit the list of selected columns from a select with by, somewhat counterintuitively, we will, for every single column, get the last value in each bucket, instead of a list of all values in the bucket:

q)select by date from quotes where askprice>=40f
date      | code  time         bidprice bidsize askprice asksize mktflag
----------| ------------------------------------------------------------
2016.04.07| GCM16 15:20:02.224 1240.9   6       1241     1       E
2016.04.08| CLM16 15:21:43.786 40.76    3       40.77    28      E

Group-by specifications make it easy to produce summary statistics:

select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by date,code from quotes where askprice<1250f
date       code | first_bidprice avg_askprice max_bidsize number_of_quotes
----------------| --------------------------------------------------------
2016.04.07 CLM16| 38.34          38.2825      86          4
2016.04.07 GCM16| 1241.4         1240.4       22          3
2016.04.08 CLM16| 40.83          40.74        38          4

We can also group by the result of a function, e.g.:

select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by{[x] `hh$x}each time from quotes where askprice<1250f
time| first_bidprice avg_askprice max_bidsize number_of_quotes
----| --------------------------------------------------------
10  | 38.34          39.595       86          2
11  | 1241.4         1241.5       22          1
12  | 38.12          38.125       16          2
13  | 1238.6         639.63       38          2
15  | 38.52          340.2725     26          4

Although in this case the use of each is superfluous: `hh$... works just as well on lists as on scalars:

select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by`hh$time from quotes where askprice<1250f

Exercise

Group the data from trades by the number of digits in the price.

Solution

We could do

select code,date,time,price,volume,mktflag,comments by count each string price from trades
price| code                                                         date     ..
-----| ----------------------------------------------------------------------..
4    | `CLM16`GCM16                                                 2016.04.0..
5    | `CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.0..
6    | `GCM16`GCM16`GCM16`GCM16`GCM16                               2016.04.0..

but this counts the decimal point as a digit (which it isn't).

To remedy this, we could use

select code,date,time,price,volume,mktflag,comments by {count sv[""]vs["."]x} each string price from trades
price| code                                                               dat..
-----| ----------------------------------------------------------------------..
3    | ,`CLM16                                                            ,20..
4    | `CLM16`VXK16`GCM16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 201..
5    | `GCM16`GCM16`GCM16`GCM16`GCM16                                     201..

If we care only about the digits before the decimal point, we could use

select code,date,time,price,volume,mktflag,comments by (1+floor log[price]%log 10) from trades
x| code                                                               date   ..
-| --------------------------------------------------------------------------..
2| `CLM16`CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04..
4| `GCM16`GCM16`GCM16`GCM16`GCM16`GCM16                               2016.04..

xbar

Consider the following example:

q)5 xbar 23 21 56 39 10 9 81 20 45 101 15
20 20 55 35 10 5 80 20 45 100 15

For each integer in the list on the right xbar outputs a multiple of 5 that is less than or equal to it. xbar works with floats, too:

q)5 xbar 23.7 21.9 56.3 39.2 10.1 9.8 81.3 20.4 45.4 101.0 15.3
20 20 55 35 10 5 80 20 45 100 15f

And, importantly, it works with temporal types:

q)(1000*60*5)xbar 00:03:09.022 00:05:58.921 00:21:00.000 00:29:00.000
00:00:00.000 00:05:00.000 00:20:00.000 00:25:00.000

On the left we have the number of milliseconds. (When working with timestamps we should use nanoseconds.) Thus we have produced five-minute bars.

xbar is defined in k as follows:

q)xbar
k){x*y div x:$[16h=abs[@x];"j"$x;x]}

Q-sql select: xbar in group-by specifications (by phrase)

xbar is particularly useful in group-by specifications.

Thus, if we want to group-by 15-minute intervals, we can apply it like so:

select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by(15 * 60 * 1000000000)xbar(quotes[`date]+quotes[`time]) from quotes where askprice<1250f
x                            | first_bidprice avg_askprice max_bidsize number_of_quotes
-----------------------------|--------------------------------------------------------
2016.04.07D10:15:00.000000000| 38.34          38.35        86          1
2016.04.07D11:00:00.000000000| 1241.4         1241.5       22          1
2016.04.07D12:00:00.000000000| 38.12          38.13        7           1
2016.04.07D12:15:00.000000000| 38.11          38.12        16          1
2016.04.07D13:00:00.000000000| 1238.6         1238.7       8           1
2016.04.07D15:00:00.000000000| 38.52          38.53        9           1
2016.04.07D15:15:00.000000000| 1240.9         1241         6           1
2016.04.08D10:45:00.000000000| 40.83          40.84        6           1
2016.04.08D13:45:00.000000000| 40.54          40.56        38          1
2016.04.08D15:15:00.000000000| 40.77          40.78        26          2

Exercise

Produce a table of last trade prices, last trade volumes, average trade prices, and average trade volumes in each one hour-long bar.

Solution

The question is whether we want to lump different dates into the same bucket. If we do, then we could use

select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000*60*60)xbar time from trades
time        | lastp  lastv avgp    avgv
------------| -------------------------
10:00:00.000| 1240   3     639.15  3.5
11:00:00.000| 41.43  2     29.98   1.5
12:00:00.000| 18.49  1     511.548 2.6
13:00:00.000| 18.85  5     28.445  4
14:00:00.000| 18.35  1     629.775 1
15:00:00.000| 19.05  2     29.915  2.5
16:00:00.000| 1258.7 1     1258.3  1

More likely than not, we want to distinguish the different dates, so we apply xbar as follows:

q)select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000000000*60*60)xbar date+time from trades
date                         | lastp  lastv avgp     avgv
-----------------------------| --------------------------
2016.04.07D10:00:00.000000000| 38.3   4     38.3     4
2016.04.07D12:00:00.000000000| 1240.5 1     1240.1   3.5
2016.04.07D13:00:00.000000000| 18.85  5     28.445   4
2016.04.07D14:00:00.000000000| 1241.2 1     1241.2   1
2016.04.08D10:00:00.000000000| 1240   3     1240     3
2016.04.08D11:00:00.000000000| 18.53  1     18.53    1
2016.04.08D12:00:00.000000000| 18.49  1     25.84667 2
2016.04.08D15:00:00.000000000| 40.78  3     40.78    3
2016.04.11D11:00:00.000000000| 41.43  2     41.43    2
2016.04.11D14:00:00.000000000| 18.35  1     18.35    1
2016.04.11D15:00:00.000000000| 19.05  2     19.05    2
2016.04.11D16:00:00.000000000| 1258.7 1     1258.3   1

exec

exec behaves similarly to but differently from select. Unlike select, it does not return tables.

The following

exec from quotes

will return the first row of the table as a dictionary:

code    | `CLM16
date    | 2016.04.08
time    | 15:21:43.786
bidprice| 40.76
bidsize | 3h
askprice| 40.77
asksize | 28h
mktflag | `E

The following will return a dictionary of two lists:

exec date, time from quotes
date| 2016.04.07   2016.04.07   2016.04.07   2016.04.07   2016.04.07   2016.0..
time| 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:..

Whereas the following will simply return a list:

exec date from quotes

update

update, whose syntax is similar to that of select, can be used to update tables.

For example, to replace all codes with `UNKNOWN in the table quotes, we can use the following:

update code:`UNKNOWN from quotes

To replace the code `CLM16 with `crudeoil we can use

update code:`crudeoil from trades where code=`CLM16

These q-sql statements will return new tables. To update the table in place, reference it by name (`quotes) instead of using the variable quotes in the query:

q)update code:`crudeoil from `trades where code=`CLM16
`trades

Exercise

Use update to set all instances of bidprice where bidprice is greater than 1000f to 0n.

Solution

q)update bidprice:0n from `quotes where bidprice>1000f
`quotes

Exercise

Use update to set all bidprices in the table quotes to an increasing sequence of floats: 0f 1f 2f 3f...

Solution

update bidprice:`float$til count quotes from `quotes

or

update bidprice:0f+til count quotes from `quotes

or

update bidprice:`float$i from `quotes

or

update bidprice:0f+i from `quotes

Column attributes

Recall the meaning of the column attributes:

  • `ssorted: `s#2 2 3. The elements of the list must be sorted. By marking the list as sorted, binary search can be performed instead of fully scanning the column, speeding up many operations.
  • `uunique: `u#2 4 5. The elements of the list must be unique. This attribute creates a unique hash table in the background, allowing constant time lookup of elements.
  • `pparted: `p#2 2 1. All elements of the same value must occur in one sequential block. By storing the same items continuously on-disc reading using standard hard drives is much faster.
  • `ggrouped: `g#2 1 2. This attribute is closest to indexing in standard databases. A lookup table from each distinct value in the table is created to map to the positions where that value occurs. This enables much quicker lookup of the entries.

We can use update to set and unset the column attributes:

q)meta quotes
c       | t f a
--------| -----
code    | s
date    | d
time    | t
bidprice| f
bidsize | h
askprice| f
asksize | h
mktflag | s
q)update `s#date from `quotes
`quotes
q)meta quotes
c       | t f a
--------| -----
code    | s
date    | d   s
time    | t
bidprice| f
bidsize | h
askprice| f
asksize | h
mktflag | s

Exercise

Set the grouped attribute on the column code and the sorted attribute on the column date of the trades table.

Solution

q)meta trades
c       | t f a
--------| -----
code    | s
date    | d
time    | t
price   | f
volume  | h
mktflag | s
comments| C
q)update `g#code,`s#date from `trades
`trades
q)meta trades
c       | t f a
--------| -----
code    | s   g
date    | d   s
time    | t
price   | f
volume  | h
mktflag | s
comments| C

Exercise

Remove the attributes from the code and date columns of the trades table.

Solution

q)meta trades
c       | t f a
--------| -----
code    | s   g
date    | d   s
time    | t
price   | f
volume  | h
mktflag | s
comments| C
q)update `#code,`#date from `trades
`trades
q)meta trades
c       | t f a
--------| -----
code    | s
date    | d
time    | t
price   | f
volume  | h
mktflag | s
comments| C

delete

delete also has a similar syntax to that of update.

We can use the following to delete all quotes for `GCM16:

delete from `quotes where code=`GCM16

Notice that we have again referred to the table by its name, `quotes, in order to change it in place.

Exercise

Delete all rows from quotes where the askprice is greater than 1000f.

Solution

q)delete from `quotes where askprice>1000f
`quotes

Deleting columns

delete can be used to delete columns as well as rows:

delete bidprice from quotes

This will return a copy. To modify the table in place, use the symbol reference `quotes instead of quotes.

Exercise

Delete the columns mktflag and comments from the table trades.

Solution

q)delete mktflag,comments from `trades
`trades

The order of evaluation

We could use 0N! to investigate the order of evaluation in q:

q)0N!1;0N!2;0N!3
1
2
3
3
q)(0N!1;0N!2;0N!3)
3
2
1
1 2 3
q)add:{x+y+z};add[0N!1;0N!2;0N!3]
3
2
1
6
q)select 0N!date,0N!time,0N!askprice from quotes
2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 ..
10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:00.051..
38.35 1241.5 38.13 38.12 1238.7 38.53 1241 40.84 40.56 40.79 40.77
date       time         askprice
--------------------------------
2016.04.07 10:15:00.010 38.35
2016.04.07 11:02:16.663 1241.5
2016.04.07 12:05:00.303 38.13
2016.04.07 12:22:00.486 38.12
2016.04.07 13:00:00.205 1238.7
2016.04.07 15:00:00.051 38.53
2016.04.07 15:20:02.224 1241
2016.04.08 10:53:00.002 40.84
2016.04.08 13:56:30.070 40.56
2016.04.08 15:20:02.000 40.79
2016.04.08 15:21:43.786 40.77
q)select 0N!date,0N!time,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7
00000001111b
1011b
2016.04.08 2016.04.08 2016.04.08
10:53:00.002 15:20:02.000 15:21:43.786
40.84 40.79 40.77
date       time         askprice
--------------------------------
2016.04.08 10:53:00.002 40.84
2016.04.08 15:20:02.000 40.79
2016.04.08 15:21:43.786 40.77

From this it is clear that the order of evaluation is as follows:

  • date>2016.04.07
  • askprice>40.7
  • date
  • time
  • askprice
q)select 0N!time,0N!askprice by 0N!date from quotes where 0N!date>2016.04.07,0N!askprice>40.7
00000001111b
1011b
2016.04.08 2016.04.08 2016.04.08
10:53:00.002 15:20:02.000 15:21:43.786
40.84 40.79 40.77
date      | time                                   askprice
----------| --------------------------------------------------------
2016.04.08| 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77

From this it is clear that the order of evaluation is as follows:

  • date>2016.04.07
  • askprice>40.7
  • by date
  • time
  • askprice
q)select 0N!time by 0N!date,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7
00000001111b
1011b
2016.04.08 2016.04.08 2016.04.08
40.84 40.79 40.77
,10:53:00.002
,15:20:02.000
,15:21:43.786
date       askprice| time
-------------------| ------------
2016.04.08 40.77   | 15:21:43.786
2016.04.08 40.79   | 15:20:02.000
2016.04.08 40.84   | 10:53:00.002

From this it is clear that the order of evaluation is as follows:

  • date>2016.04.07
  • askprice>40.7
  • by date
  • by askprice
  • time