Programming/Kdb/Q-sql
Example table schemas
Consider the following quotes table schema (empty table):
quotes:([] code:`symbol$(); date:`date$(); time:`time$(); bidprice:`float$(); bidsize:`short$(); askprice:`float$(); asksize:`short$(); mktflag:`symbol$())
...and the following trades table schema (empty table):
trades:([] code:`symbol$(); date:`date$(); time:`time$(); price:`float$(); volume:`short$(); mktflag:`symbol$(); comments:())
Insert
Let us add the first row to the quotes table using an insert:
insert[`quotes;(`CLM16;2016.04.07;10:15:00.010;38.34;86h;38.35;3h;`E)]
Notice that when we want to modify a table, we usually provide a reference to it as a symbol, `quotes, instead of passing it as a variable, quotes.
The values of the new row are provided in order, as a list.
We can now examine the table \code{quotes}:
quotes
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E
We can add the second row in a similar manner:
insert[`quotes;(`GCM16;2016.04.07;11:02:16.663;1241.4;22h;1241.5;1h;`E)]
Let us again examine the table \code{quotes}:
quotes
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E GCM16 2016.04.07 11:02:16.663 1241.4 22 1241.5 1 E
\begin{frame}[fragile] \frametitle{Inserting multiple rows at once~(i)} One can insert multiple rows to the \code{quotes} table using the followning syntax: \begin{snippet}{q} .[`quotes;();,;(
(`CLM16; 2016.04.07; 12:05:00.303; 38.12 ; 7h ; 38.13 ; 13h; `E); (`CLM16; 2016.04.07; 12:22:00.486; 38.11 ; 16h; 38.12 ; 8h ; `E); (`GCM16; 2016.04.07; 13:00:00.205; 1238.6; 8h ; 1238.7; 7h ; `E); (`CLM16; 2016.04.07; 15:00:00.051; 38.52 ; 9h ; 38.53 ; 18h; `E); (`GCM16; 2016.04.07; 15:20:02.224; 1240.9; 6h ; 1241f ; 1h ; `E); (`CLM16; 2016.04.08; 10:53:00.002; 40.83 ; 6h ; 40.84 ; 66h; `E); (`CLM16; 2016.04.08; 13:56:30.070; 40.54 ; 38h; 40.56 ; 58h; `E); (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E); (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]
\end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Inserting multiple rows at once~(ii)} Let us again examine the table \code{quotes}: \begin{snippet}{q} quotes \end{snippet} \begin{snippet}{output} code date time bidprice bidsize askprice asksize mktflag
CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E GCM16 2016.04.07 11:02:16.663 1241.4 22 1241.5 1 E CLM16 2016.04.07 12:05:00.303 38.12 7 38.13 13 E CLM16 2016.04.07 12:22:00.486 38.11 16 38.12 8 E GCM16 2016.04.07 13:00:00.205 1238.6 8 1238.7 7 E CLM16 2016.04.07 15:00:00.051 38.52 9 38.53 18 E GCM16 2016.04.07 15:20:02.224 1240.9 6 1241 1 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E CLM16 2016.04.08 13:56:30.070 40.54 38 40.56 58 E CLM16 2016.04.08 15:20:02.000 40.77 26 40.79 44 E CLM16 2016.04.08 15:21:43.786 40.76 3 40.77 28 E \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Dot (.) for dyadic functions~(i)} We have just used \defn{dot} (.) for dyadic functions. Quoting~\cite{borror-2015}, the general form of functional . for dyadic functions is \begin{snippet}{q} .[L;I;f;y] \end{snippet} $L$ is a mapping with explicit domain, $I$ is a list in the domain of $L$, $f$ is a dyadic function and $y$ is an atom or list of the proper shape. For a list, the result is the item-wise application to the items of $L$ indexed at depth by $I$, of $f$ and the parameter $y$. Over the subdomain $I$, the map output becomes \begin{snippet}{q} (L . I) f y / binary operator f[L . I; y] / dyadic function \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Dot (.) for dyadic functions~(ii)} \begin{itemize} \item Thus, if \begin{snippet}{q} L:(100 200;300 400 500 600;600 700); \end{snippet} then \begin{snippet}{q} .[L;1 2;+;42] \end{snippet} amounts to copying $L$, adding 42 to the $[1;2]$nd element of the copy, and returning that copy: \begin{snippet}{output} (100j, 200j;300j, 400j, 542j, 600j;600j 700j) \end{snippet} \item Similarly, \begin{snippet}{q} .[L;(1;(0;2;3));+;42] \end{snippet} amounts to copying $L$, adding 42 to the $[1;0]$th, $[1;2]$nd, and $[1;3]$rd elements of the copy, and returning that copy: \begin{snippet}{output} (100j, 200j;342j, 400j, 542j, 642j;600j, 700j) \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Dot (.) for dyadic functions~(iii)} \begin{itemize} \item Whereas \begin{snippet}{q} .[L;(1;(0;2;3));+;(42;45;44)] \end{snippet} amounts to copying $L$, adding 42 to the $[1; 0]$th, 45 to the $[1; 2]$nd, and 44 to the $[1; 3]$rd element of the copy, and returning that copy: \begin{snippet}{output} (100j, 200j;342j, 400j, 545j, 644j;600j, 700j) \end{snippet} \item And \begin{snippet}{q} .[L;();+;42] \end{snippet} amounts to copying $L$, adding 42 to \nb{all} elements of the copy, and returning that copy: \begin{snippet}{output} (142j, 242j;342j, 442j, 542j, 642j;642j, 742j) \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Dot (.) for dyadic functions~(iv)} In none of the cases was the original $L$ modified. If we wanted to modify it, we would have passed a \nb{name reference}, as a symbol, instead: \begin{snippet}{q} .[`L;();+;42] \end{snippet} \begin{snippet}{output} `L \end{snippet} \begin{snippet}{q} L \end{snippet} \begin{snippet}{output} (142j, 242j;342j, 442j, 542j, 642j;642j, 742j) \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Dot (.) for dyadic functions (v)} \begin{itemize} \item Along the same lines, we can use . to extend a list (remember that , concatenates): \begin{snippet}{q} scientists:`newton`leibniz; .[`scientists;();,;`wiener`khinchin`kolmogorov]; scientists \end{snippet} \begin{snippet}{output} `newton`leibniz`wiener`khinchin`kolmogorov \end{snippet} \item Just as we used to extend a table. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Deleting rows from a table} \begin{itemize} \item To delete all rows from the \code{quotes} table we can use the following code: \begin{snippet}{q} q)delete from `quotes \end{snippet} \item Notice that we referred to the quotes table by its symbol name, \code{`quotes}, instead of providing a variable, \code{quotes}. This is generally the case when we want to modify a table in place. \item We can check that the \code{quotes} table is now indeed empty: \begin{snippet}{q} q)count quotes 0 q)quotes code date time bidprice bidsize askprice asksize mktflag
\end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Repopulating \textbf{quotes}} Let us repopulate \code{quotes} again: \begin{snippet}{q} quotes:([]code:`symbol$();date:`date$();time:`time$();bidprice:`float$();bidsize:`short$();askprice:`float$();asksize:`short$();mktflag:`symbol$()); .[`quotes;();,;(
(`CLM16; 2016.04.07; 10:15:00.010; 38.34 ; 86h; 38.35 ; 3h ; `E); (`GCM16; 2016.04.07; 11:02:16.663; 1241.4; 22h; 1241.5; 1h ; `E); (`CLM16; 2016.04.07; 12:05:00.303; 38.12 ; 7h ; 38.13 ; 13h; `E); (`CLM16; 2016.04.07; 12:22:00.486; 38.11 ; 16h; 38.12 ; 8h ; `E); (`GCM16; 2016.04.07; 13:00:00.205; 1238.6; 8h ; 1238.7; 7h ; `E); (`CLM16; 2016.04.07; 15:00:00.051; 38.52 ; 9h ; 38.53 ; 18h; `E); (`GCM16; 2016.04.07; 15:20:02.224; 1240.9; 6h ; 1241f ; 1h ; `E); (`CLM16; 2016.04.08; 10:53:00.002; 40.83 ; 6h ; 40.84 ; 66h; `E); (`CLM16; 2016.04.08; 13:56:30.070; 40.54 ; 38h; 40.56 ; 58h; `E); (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E); (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]
\end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{The metadata of the trades table before we populate it} \begin{itemize} \item Let us examine the metadata of the trades table before we populate it: \begin{snippet}{q} meta trades \end{snippet} \begin{snippet}{output} c | t f a
| -----
code | s date | d time | t price | f volume | h mktflag | s comments| \end{snippet} \item Note that \begin{snippet}{q} type trades[`comments] \end{snippet} is \code{0h}, so this column is a \nb{mixed} list (remember that table columns are all lists). It appeared in the schema without a type cast: \begin{snippet}{q} trades : ([]
... volume:`short$(); mktflag:`symbol$(); comments:())
\end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Use dot (\code{.}) to populate the \code{trades} table with the following rows: \begin{snippet}{q} (`CLM16; 2016.04.07; 10:20:00.329; 38.3 ; 4h; `E; "NYMEX/CME Group trade"); (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; ""); (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; ""); (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; ""); (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; ""); (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; ""); (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class"); (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade"); (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; ""); (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; ""); (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; ""); (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; "") \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(i)} \begin{snippet}{q} .[`trades;();,;(
(`CLM16; 2016.04.07; 10:20:00.329; 38.3 ; 4h; `E; "NYMEX/CME Group trade"); (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; ""); (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; ""); (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; ""); (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; ""); (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; ""); (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class"); (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade"); (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; ""); (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; ""); (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; ""); (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]
\end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(ii)} \begin{snippet}{q} trades \end{snippet} \begin{snippet}{output} code date time price volume mktflag comments ..
..
CLM16 2016.04.07 10:20:00.329 38.3 4 E "NYMEX/CME Group trade" .. GCM16 2016.04.07 12:00:00.055 1239.7 6 E "" .. GCM16 2016.04.07 12:37:02.106 1240.5 1 E "" .. CLM16 2016.04.07 13:00:00.128 38.04 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.07 13:22:05.617 18.85 5 E "" .. GCM16 2016.04.07 14:35:01.241 1241.2 1 E "" .. GCM16 2016.04.08 10:13:01.048 1240 3 E "" .. VXK16 2016.04.08 11:34:53.417 18.53 1 E "Transaction represents a.. CLM16 2016.04.08 12:00:00.227 40.61 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.08 12:44:00.684 18.44 2 E "Transaction represents a.. VXK16 2016.04.08 12:45:33.130 18.49 1 E "Transaction represents a.. CLM16 2016.04.08 15:20:02.000 40.78 3 E "NYMEX/CME Group trade" .. CLM16 2016.04.11 11:00:00.105 41.43 2 E "NYMEX/CME Group trade" .. VXK16 2016.04.11 14:00:00.829 18.35 1 E "" .. VXK16 2016.04.11 15:14:58.775 19.05 2 E "" .. GCM16 2016.04.11 16:00:00.044 1257.9 1 E "" .. GCM16 2016.04.11 16:28:34.311 1258.7 1 E "" .. \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{The metadata of the trades table after we have populated it} \begin{itemize} \item Let us examine the metadata of the trades table before we populate it: \begin{snippet}{q} meta trades \end{snippet} \begin{snippet}{output} c | t f a
| -----
code | s date | d time | t price | f volume | h mktflag | s comments| C \end{snippet} \item Note that \begin{snippet}{q} type trades[`comments] \end{snippet}
is still \code{0h}, but the metadata above tells us that \code{comments} is a column of strings. Strings are themselves lists
\begin{snippet}{q} trades[`comments][0][0] \end{snippet} \begin{snippet}{output} "N" \end{snippet}
and the only way to declare a list of lists is through a mixed list, there is no other way we could have declared this column in our schema.
\end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select} \begin{itemize} \item Consider the simplest possible form of q-sql \defn{select}: \begin{snippet}{q} select from quotes \end{snippet} \begin{snippet}{output} code date time bidprice bidsize askprice asksize mktflag
CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E GCM16 2016.04.07 11:02:16.663 1241.4 22 1241.5 1 E CLM16 2016.04.07 12:05:00.303 38.12 7 38.13 13 E CLM16 2016.04.07 12:22:00.486 38.11 16 38.12 8 E GCM16 2016.04.07 13:00:00.205 1238.6 8 1238.7 7 E CLM16 2016.04.07 15:00:00.051 38.52 9 38.53 18 E GCM16 2016.04.07 15:20:02.224 1240.9 6 1241 1 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E CLM16 2016.04.08 13:56:30.070 40.54 38 40.56 58 E CLM16 2016.04.08 15:20:02.000 40.77 26 40.79 44 E CLM16 2016.04.08 15:21:43.786 40.76 3 40.77 28 E \end{snippet} \item We have selected all rows from the table. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Select all rows from the \code{trades} table. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select from trades \end{snippet} \begin{snippet}{output} code date time price volume mktflag comments ..
..
CLM16 2016.04.07 10:20:00.329 38.3 4 E "NYMEX/CME Group trade" .. GCM16 2016.04.07 12:00:00.055 1239.7 6 E "" .. GCM16 2016.04.07 12:37:02.106 1240.5 1 E "" .. CLM16 2016.04.07 13:00:00.128 38.04 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.07 13:22:05.617 18.85 5 E "" .. GCM16 2016.04.07 14:35:01.241 1241.2 1 E "" .. GCM16 2016.04.08 10:13:01.048 1240 3 E "" .. VXK16 2016.04.08 11:34:53.417 18.53 1 E "Transaction represents a.. CLM16 2016.04.08 12:00:00.227 40.61 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.08 12:44:00.684 18.44 2 E "Transaction represents a.. VXK16 2016.04.08 12:45:33.130 18.49 1 E "Transaction represents a.. CLM16 2016.04.08 15:20:02.000 40.78 3 E "NYMEX/CME Group trade" .. CLM16 2016.04.11 11:00:00.105 41.43 2 E "NYMEX/CME Group trade" .. VXK16 2016.04.11 14:00:00.829 18.35 1 E "" .. VXK16 2016.04.11 15:14:58.775 19.05 2 E "" .. GCM16 2016.04.11 16:00:00.044 1257.9 1 E "" .. GCM16 2016.04.11 16:28:34.311 1258.7 1 E "" .. \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: selecting columns} \begin{itemize} \item We may list the columns that we want to select, in the order in which we want them to appear in the resulting table: \begin{snippet}{q} select date,time,code,asksize,askprice from quotes \end{snippet} \begin{snippet}{output} date time code asksize askprice
2016.04.07 10:15:00.010 CLM16 3 38.35 2016.04.07 11:02:16.663 GCM16 1 1241.5 2016.04.07 12:05:00.303 CLM16 13 38.13 2016.04.07 12:22:00.486 CLM16 8 38.12 2016.04.07 13:00:00.205 GCM16 7 1238.7 2016.04.07 15:00:00.051 CLM16 18 38.53 2016.04.07 15:20:02.224 GCM16 1 1241 2016.04.08 10:53:00.002 CLM16 66 40.84 2016.04.08 13:56:30.070 CLM16 58 40.56 2016.04.08 15:20:02.000 CLM16 44 40.79 2016.04.08 15:21:43.786 CLM16 28 40.77 \end{snippet} \item By default, the output is not saved, but we can set it to a variable: \begin{snippet}{q} new_quotes_table:select date,time,code,asksize,askprice from quotes \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Select the \code{date}, \code{time}, \code{volume}, and \code{price} (in that order) from the \code{trades} table. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select date,time,volume,price from trades \end{snippet} \begin{snippet}{output} date time volume price
2016.04.07 10:20:00.329 4 38.3 2016.04.07 12:00:00.055 6 1239.7 2016.04.07 12:37:02.106 1 1240.5 2016.04.07 13:00:00.128 3 38.04 2016.04.07 13:22:05.617 5 18.85 2016.04.07 14:35:01.241 1 1241.2 2016.04.08 10:13:01.048 3 1240 2016.04.08 11:34:53.417 1 18.53 2016.04.08 12:00:00.227 3 40.61 2016.04.08 12:44:00.684 2 18.44 2016.04.08 12:45:33.130 1 18.49 2016.04.08 15:20:02.000 3 40.78 2016.04.11 11:00:00.105 2 41.43 2016.04.11 14:00:00.829 1 18.35 2016.04.11 15:14:58.775 2 19.05 2016.04.11 16:00:00.044 1 1257.9 2016.04.11 16:28:34.311 1 1258.7 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: selecting columns under new names} We may give the selected columns new names in the result: \begin{snippet}{q} select d:date,t:time,code,size:asksize,price:askprice from quotes \end{snippet} \begin{snippet}{output} d t code size price
2016.04.07 10:15:00.010 CLM16 3 38.35 2016.04.07 11:02:16.663 GCM16 1 1241.5 2016.04.07 12:05:00.303 CLM16 13 38.13 2016.04.07 12:22:00.486 CLM16 8 38.12 2016.04.07 13:00:00.205 GCM16 7 1238.7 2016.04.07 15:00:00.051 CLM16 18 38.53 2016.04.07 15:20:02.224 GCM16 1 1241 2016.04.08 10:53:00.002 CLM16 66 40.84 2016.04.08 13:56:30.070 CLM16 58 40.56 2016.04.08 15:20:02.000 CLM16 44 40.79 2016.04.08 15:21:43.786 CLM16 28 40.77 \end{snippet} \end{frame}
\begin{frame} \frametitle{Exercise} Select the \code{date}, \code{time}, \code{volume}, and \code{price} (in that order) from the \code{trades} table but under the names \code{d}, \code{t}, \code{v}, and \code{p}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select d:date,t:time,v:volume,p:price from trades \end{snippet} \begin{snippet}{output} d t v p
2016.04.07 10:20:00.329 4 38.3 2016.04.07 12:00:00.055 6 1239.7 2016.04.07 12:37:02.106 1 1240.5 2016.04.07 13:00:00.128 3 38.04 2016.04.07 13:22:05.617 5 18.85 2016.04.07 14:35:01.241 1 1241.2 2016.04.08 10:13:01.048 3 1240 2016.04.08 11:34:53.417 1 18.53 2016.04.08 12:00:00.227 3 40.61 2016.04.08 12:44:00.684 2 18.44 2016.04.08 12:45:33.130 1 18.49 2016.04.08 15:20:02.000 3 40.78 2016.04.11 11:00:00.105 2 41.43 2016.04.11 14:00:00.829 1 18.35 2016.04.11 15:14:58.775 2 19.05 2016.04.11 16:00:00.044 1 1257.9 2016.04.11 16:28:34.311 1 1258.7 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: operations on columns~(i)} Perhaps more interestingly, selected columns may be functions of / results of operations on the existing ones: \begin{snippet}{q} select dt:date+time,code,size:asksize,price:askprice from quotes \end{snippet} \begin{snippet}{output} dt code size price
2016.04.07D10:15:00.010000000 CLM16 3 38.35 2016.04.07D11:02:16.663000000 GCM16 1 1241.5 2016.04.07D12:05:00.303000000 CLM16 13 38.13 2016.04.07D12:22:00.486000000 CLM16 8 38.12 2016.04.07D13:00:00.205000000 GCM16 7 1238.7 2016.04.07D15:00:00.051000000 CLM16 18 38.53 2016.04.07D15:20:02.224000000 GCM16 1 1241 2016.04.08D10:53:00.002000000 CLM16 66 40.84 2016.04.08D13:56:30.070000000 CLM16 58 40.56 2016.04.08D15:20:02.000000000 CLM16 44 40.79 2016.04.08D15:21:43.786000000 CLM16 28 40.77 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: operations on columns~(ii)} Remember that columns are lists, so the above + works on two lists: \begin{snippet}{q} q)quotes[`date] 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 .. q)quotes[`time] 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:00.051.. q)quotes[`date]+quotes[`time] 2016.04.07D10:15:00.010000000 2016.04.07D11:02:16.663000000 2016.04.07D12:05:.. \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} \begin{itemize} \item The best bid and best ask (offer) prices are denoted by $P^b_t$ and $P^a_t$, respectively. \item The \defn{midprice} is the average of the best bid and best ask prices:
\begin{equation*} \text{Midprice}_t = \frac{P^a_t + P^b_t}{2}. \end{equation*}
\item Write a q-sql \code{select} query selecting the timestamp (\code{date + time}) and the midprice. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select dt:date+time,mid:.5*bidprice+askprice from quotes \end{snippet} \begin{snippet}{output} dt mid
2016.04.07D10:15:00.010000000 38.345 2016.04.07D11:02:16.663000000 1241.45 2016.04.07D12:05:00.303000000 38.125 2016.04.07D12:22:00.486000000 38.115 2016.04.07D13:00:00.205000000 1238.65 2016.04.07D15:00:00.051000000 38.525 2016.04.07D15:20:02.224000000 1240.95 2016.04.08D10:53:00.002000000 40.835 2016.04.08D13:56:30.070000000 40.55 2016.04.08D15:20:02.000000000 40.78 2016.04.08D15:21:43.786000000 40.765 \end{snippet} \end{frame}
\begin{frame} \frametitle{Exercise} \begin{itemize} \item The \defn{microprice} is often more useful than the midprice, since it weights the best bid and best ask prices with the volumes posted at the best bid and best ask prices:
\begin{equation*} \text{Microprice}_t = \frac{V^b_t P^a_t + V^a_t P^b_t}{V^b_t + V^a_t}, \end{equation*} where $V^b_t$ and $V^a_t$ represent the volumes posted at the best bid and best ask prices, respectively.
\item Modify your query to add the microprice to it. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select dt:date+time,mid:.5*bidprice+askprice,microprice:((bidsize*askprice)+asksize*bidprice)%bidsize+asksize from quotes \end{snippet} \begin{snippet}{output} dt mid microprice
2016.04.07D10:15:00.010000000 38.345 38.34966 2016.04.07D11:02:16.663000000 1241.45 1241.496 2016.04.07D12:05:00.303000000 38.125 38.1235 2016.04.07D12:22:00.486000000 38.115 38.11667 2016.04.07D13:00:00.205000000 1238.65 1238.653 2016.04.07D15:00:00.051000000 38.525 38.52333 2016.04.07D15:20:02.224000000 1240.95 1240.986 2016.04.08D10:53:00.002000000 40.835 40.83083 2016.04.08D13:56:30.070000000 40.55 40.54792 2016.04.08D15:20:02.000000000 40.78 40.77743 2016.04.08D15:21:43.786000000 40.765 40.76097 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{A challenge~(i)} \begin{itemize} \item So what if we want to return only the first character from each code? \begin{snippet}{q} select dt:date+time,shortcode:code[0],size:asksize,price:askprice from quotes \end{snippet} does not give the desired result: \begin{snippet}{output} dt shortcode size price
2016.04.07D10:15:00.010000000 CLM16 3 38.35 2016.04.07D11:02:16.663000000 CLM16 1 1241.5 2016.04.07D12:05:00.303000000 CLM16 13 38.13 2016.04.07D12:22:00.486000000 CLM16 8 38.12 2016.04.07D13:00:00.205000000 CLM16 7 1238.7 2016.04.07D15:00:00.051000000 CLM16 18 38.53 2016.04.07D15:20:02.224000000 CLM16 1 1241 2016.04.08D10:53:00.002000000 CLM16 66 40.84 2016.04.08D13:56:30.070000000 CLM16 58 40.56 2016.04.08D15:20:02.000000000 CLM16 44 40.79 2016.04.08D15:21:43.786000000 CLM16 28 40.77 \end{snippet} \item This is because the indexing, \code{...[0]}, is applied to the entire list \code{quotes[`code]}, and we get a symbol scalar, \code{`CLM16}. This scalar is returned for each row in the result, as you can see above. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{A challenge~(ii)} Indeed, this is what happens when we include scalars among selected columns: \begin{snippet}{q} select dt:date+time,shortcode:123,size:asksize,price:askprice from quotes \end{snippet} \begin{snippet}{output} dt shortcode size price
2016.04.07D10:15:00.010000000 123 3 38.35 2016.04.07D11:02:16.663000000 123 1 1241.5 2016.04.07D12:05:00.303000000 123 13 38.13 2016.04.07D12:22:00.486000000 123 8 38.12 2016.04.07D13:00:00.205000000 123 7 1238.7 2016.04.07D15:00:00.051000000 123 18 38.53 2016.04.07D15:20:02.224000000 123 1 1241 2016.04.08D10:53:00.002000000 123 66 40.84 2016.04.08D13:56:30.070000000 123 58 40.56 2016.04.08D15:20:02.000000000 123 44 40.79 2016.04.08D15:21:43.786000000 123 28 40.77 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{A challenge~(iii)} \begin{itemize} \item Still, what if we want to return the first character of each code? \item One way to do this is by applying a function to \code{each} element of the list which represents the resulting column: \begin{snippet}{q} select dt:date+time,shortcode:{`$1#string x}each code,size:asksize,price:askprice from quotes \end{snippet} \begin{snippet}{output} dt shortcode size price
2016.04.07D10:15:00.010000000 C 3 38.35 2016.04.07D11:02:16.663000000 G 1 1241.5 2016.04.07D12:05:00.303000000 C 13 38.13 2016.04.07D12:22:00.486000000 C 8 38.12 2016.04.07D13:00:00.205000000 G 7 1238.7 2016.04.07D15:00:00.051000000 C 18 38.53 2016.04.07D15:20:02.224000000 G 1 1241 2016.04.08D10:53:00.002000000 C 66 40.84 2016.04.08D13:56:30.070000000 C 58 40.56 2016.04.08D15:20:02.000000000 C 44 40.79 2016.04.08D15:21:43.786000000 C 28 40.77 \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{A challenge~(iv)} Instead of the keyword \code{each}, we could have used the \code{'} adverb (don't forget the parentheses around the verb--adverb pair): \begin{snippet}{q} select dt:date+time,shortcode:({`$1#string x}')code,size:asksize,price:askprice from quotes \end{snippet} \begin{snippet}{output} dt shortcode size price
2016.04.07D10:15:00.010000000 C 3 38.35 2016.04.07D11:02:16.663000000 G 1 1241.5 2016.04.07D12:05:00.303000000 C 13 38.13 2016.04.07D12:22:00.486000000 C 8 38.12 2016.04.07D13:00:00.205000000 G 7 1238.7 2016.04.07D15:00:00.051000000 C 18 38.53 2016.04.07D15:20:02.224000000 G 1 1241 2016.04.08D10:53:00.002000000 C 66 40.84 2016.04.08D13:56:30.070000000 C 58 40.56 2016.04.08D15:20:02.000000000 C 44 40.79 2016.04.08D15:21:43.786000000 C 28 40.77 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{A challenge~(v)} This works since \begin{snippet}{q} {`$1#string x}each quotes[`code] \end{snippet} and \begin{snippet}{q} ({`$1#string x}')quotes[`code] \end{snippet} give the desired list \begin{snippet}{output} `C`G`C`C`G`C`G`C`C`C`C \end{snippet} which is what the above \code{select} returns for that column (remember: columns are lists). \end{frame}
\begin{frame} \frametitle{Exercise} Select from the \code{quotes} table \code{cf}, which is a symbol consisting of two characters: the first character of the \code{code} and the first character of the \code{mktflag}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(i)} \begin{itemize} \item We can form \code{(code,mktflag)} pairs with \code{code,'mktflag} and then apply a function to each (\code{'}) such pair: \begin{snippet}{q} select cf:({`$(1#string x[0]),1#string x[1]}')(code,'mktflag) from quotes \end{snippet} \begin{snippet}{output} cf -- CE GE CE CE GE CE GE CE CE CE CE \end{snippet} \item However, we can do without the `pairing' code \code{code,'mktflag}. While the keyword \code{each} does not work on functions of multiple arguments and the following throws \code{'type}, \begin{snippet}{q} select cf:{`$(1#string x),1#string y}each[code;mktflag] from quotes \end{snippet} \item Each (\code{'}) does work on functions of multiple arguments: \begin{snippet}{q} select cf:{`$(1#string x),1#string y}'[code;mktflag] from quotes \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(ii)} The following, somewhat less terse, solution works too. But here we iterate twice, whereas we could iterate only once, as in the previous solution: \begin{snippet}{q} select cf:`$(({1#string x}each code),'{1#string x}each mktflag) from quotes \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: \textbf{where} specifications (constraints)~(i)} \begin{itemize} \item As in classical SQL, in q-sql we can add to our \code{select} the \code{where} specifications (constraints). Thus, if we want to return only those columns where price is greater than \code{1000f}... \begin{snippet}{q} select date,time,price:askprice from quotes where price>1000f \end{snippet} \item Oh, wait, this returns an error: \begin{snippet}{output} 'price [0] select date,time,price:askprice from quotes where price>1000f
^
\end{snippet} \item That's because the constraints are applied first, before one of the resulting columns is given the name \code{price}. We must apply the constraint to \code{askprice} in the original table instead: \begin{snippet}{q} select date,time,price:askprice from quotes where askprice>1000f \end{snippet} \begin{snippet}{output} date time price
2016.04.07 11:02:16.663 1241.5 2016.04.07 13:00:00.205 1238.7 2016.04.07 15:20:02.224 1241 \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: \textbf{where} specifications (constraints)~(ii)} \begin{itemize} \item We can add multiple constraints separated by commas: \begin{snippet}{q} select date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000 \end{snippet} \begin{snippet}{output} date time price
2016.04.07 13:00:00.205 1238.7 2016.04.07 15:20:02.224 1241 \end{snippet} \item \nb{NB!} Unlike classical SQL, \textbf{the order of the constraints matters!} The constraints are applied in order. From the point of view of performance, you should apply the most `expensive' constraints, which shrink the result the most, first. \item In particular, when dealing with partitioned tables, include the constraint on the partitioning column (usually \textbf{date}) first. Since partitioned tables are stored on disc in directories by date, this constraint allows kdb+/q to zoom in on the right directory. If it is not the first constraint, an expensive scan of the entire database on disc may result. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: indices} Tables are stored in deterministic order. If we want to return the indices of the rows returned by the query, we can add \code{i} to the select: \begin{snippet}{q} select i,date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000 \end{snippet} \begin{snippet}{output} x date time price
4 2016.04.07 13:00:00.205 1238.7 6 2016.04.07 15:20:02.224 1241 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Select from \code{trades} those rows where \code{price} is closer to the smallest integer greater than it than the greatest integer that is less than or equal to the \code{price}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select from trades where ((ceiling price)-price)<price-floor price \end{snippet} \begin{snippet}{output} code date time price volume mktflag comments ..
..
GCM16 2016.04.07 12:00:00.055 1239.7 6 E "" .. VXK16 2016.04.07 13:22:05.617 18.85 5 E "" .. VXK16 2016.04.08 11:34:53.417 18.53 1 E "Transaction represents a.. CLM16 2016.04.08 12:00:00.227 40.61 3 E "NYMEX/CME Group trade" .. CLM16 2016.04.08 15:20:02.000 40.78 3 E "NYMEX/CME Group trade" .. GCM16 2016.04.11 16:00:00.044 1257.9 1 E "" .. GCM16 2016.04.11 16:28:34.311 1258.7 1 E "" .. \end{snippet} The following solution is terser (and involves only a call to \code{floor}, no call to \code{ceiling}): \begin{snippet}{q} select from trades where (price-floor price)>0.5 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: more complex constraints} \begin{itemize} \item More complex constraints can be formed using boolean operations: \begin{snippet}{q} select i,date,time,price:askprice from quotes where (((askprice>1000f)or(time>=12:00:00.000))and(time <= 15:30:00.000)) \end{snippet} \begin{snippet}{output} x date time price
1 2016.04.07 11:02:16.663 1241.5 2 2016.04.07 12:05:00.303 38.13 3 2016.04.07 12:22:00.486 38.12 4 2016.04.07 13:00:00.205 1238.7 5 2016.04.07 15:00:00.051 38.53 6 2016.04.07 15:20:02.224 1241 8 2016.04.08 13:56:30.070 40.56 9 2016.04.08 15:20:02.000 40.79 10 2016.04.08 15:21:43.786 40.77 \end{snippet} \item Remember that constraints also operate on lists (columns) and result in boolean lists: \begin{snippet}{q} ((quotes[`askprice]>1000f)or(quotes[`time]>=12:00:00.000))and(quotes[`time]<=15:30:00.000) \end{snippet} \begin{snippet}{output} 01111110111b \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Select from the \code{quotes} table those rows where the time is either less than 11 a.m. or greater than or equal to 3 p.m. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} select from quotes where (time<11:00:00.000)or time>=15:00:00.000 \end{snippet} \begin{snippet}{output} code date time bidprice bidsize askprice asksize mktflag
CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E CLM16 2016.04.07 15:00:00.051 38.52 9 38.53 18 E GCM16 2016.04.07 15:20:02.224 1240.9 6 1241 1 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E CLM16 2016.04.08 15:20:02.000 40.77 26 40.79 44 E CLM16 2016.04.08 15:21:43.786 40.76 3 40.77 28 E \end{snippet} Don't forget the parentheses. The query won't work correctly without them: \begin{snippet}{q} select from quotes where time<11:00:00.000 or time>=15:00:00.000 \end{snippet} \begin{snippet}{output} code date time bidprice bidsize askprice asksize mktflag
CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E \end{snippet} (Which is not what we want.) \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(i)} \begin{itemize} \item We can group the results by the value of a particular column: \begin{snippet}{q} select i,date,time,price:askprice by date from quotes where askprice>=40f \end{snippet} \begin{snippet}{output} date | x date time ..
| -----------------------------------------------------------------..
2016.04.07| 1 4 6 2016.04.07 2016.04.07 2016.04.07 11:02:16.663.. 2016.04.08| 7 8 9 10 2016.04.08 2016.04.08 2016.04.08 2016.04.08 10:53:00.002.. \end{snippet} \item The result is a keyed table (in this case, keyed by date). \item Apart from the key column, all the columns in the result are lists of lists, collecting values for each partition: \begin{snippet}{q} (select i,date,time,price:askprice by date from quotes where askprice>=40f)[2016.04.07][`time] \end{snippet} \begin{snippet}{output} 11:02:16.663 13:00:00.205 15:20:02.224 \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(ii)} Notice that, if you omit the list of selected columns from a \code{select} with \code{by}, somewhat counterintuitively, we will, for every single column, get the last value in each bucket, instead of a list of all values in the bucket: \begin{snippet}{q} q)select by date from quotes where askprice>=40f date | code time bidprice bidsize askprice asksize mktflag
| ------------------------------------------------------------
2016.04.07| GCM16 15:20:02.224 1240.9 6 1241 1 E 2016.04.08| CLM16 15:21:43.786 40.76 3 40.77 28 E \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(iii)} Group-by specifications make it easy to produce summary statistics: \begin{snippet}{q} select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by date,code from quotes where askprice<1250f \end{snippet} \begin{snippet}{output} date code | first_bidprice avg_askprice max_bidsize number_of_quotes
| --------------------------------------------------------
2016.04.07 CLM16| 38.34 38.2825 86 4 2016.04.07 GCM16| 1241.4 1240.4 22 3 2016.04.08 CLM16| 40.83 40.74 38 4 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: group-by specifications (\textbf{by} phrase)~(iv)} \begin{itemize} \item We can also group by the result of a function, e.g.: \begin{snippet}{q} select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by{[x] `hh$x}each time from quotes where askprice<1250f \end{snippet} \begin{snippet}{output} time| first_bidprice avg_askprice max_bidsize number_of_quotes
| --------------------------------------------------------
10 | 38.34 39.595 86 2 11 | 1241.4 1241.5 22 1 12 | 38.12 38.125 16 2 13 | 1238.6 639.63 38 2 15 | 38.52 340.2725 26 4 \end{snippet} \item Although in this case the use of \code{each} is superfluous: \code{`hh$...} works just as well on lists as on scalars: \begin{snippet}{q} select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by`hh$time from quotes where askprice<1250f \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Group the data from \code{trades} by the number of digits in the \code{price}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(i)} \begin{itemize} \item We could do \begin{snippet}{q} select code,date,time,price,volume,mktflag,comments by count each string price from trades \end{snippet} \begin{snippet}{output} price| code date ..
| ----------------------------------------------------------------------..
4 | `CLM16`GCM16 2016.04.0.. 5 | `CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.0.. 6 | `GCM16`GCM16`GCM16`GCM16`GCM16 2016.04.0.. \end{snippet}
but this counts the decimal point as a digit (which it isn't).
\item To remedy this, we could use \begin{snippet}{q} select code,date,time,price,volume,mktflag,comments by {count sv[""]vs["."]x} each string price from trades \end{snippet} \begin{snippet}{output} price| code dat..
| ----------------------------------------------------------------------..
3 | ,`CLM16 ,20.. 4 | `CLM16`VXK16`GCM16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 201.. 5 | `GCM16`GCM16`GCM16`GCM16`GCM16 201.. \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(ii)} If we care only about the digits before the decimal point, we could use \begin{snippet}{q} select code,date,time,price,volume,mktflag,comments by (1+floor log[price]%log 10) from trades \end{snippet} \begin{snippet}{output} x| code date .. -| --------------------------------------------------------------------------.. 2| `CLM16`CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.. 4| `GCM16`GCM16`GCM16`GCM16`GCM16`GCM16 2016.04.. \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{\textbf{xbar}} \begin{itemize} \item Consider the following example: \begin{snippet}{q} q)5 xbar 23 21 56 39 10 9 81 20 45 101 15 20 20 55 35 10 5 80 20 45 100 15 \end{snippet} \item For each integer in the list on the right \code{xbar} outputs a multiple of 5 that is less than or equal to it. \item \code{xbar} works with floats, too: \begin{snippet}{q} q)5 xbar 23.7 21.9 56.3 39.2 10.1 9.8 81.3 20.4 45.4 101.0 15.3 20 20 55 35 10 5 80 20 45 100 15f \end{snippet} \item And, importantly, it works with temporal types: \begin{snippet}{q} q)(1000*60*5)xbar 00:03:09.022 00:05:58.921 00:21:00.000 00:29:00.000 00:00:00.000 00:05:00.000 00:20:00.000 00:25:00.000 \end{snippet} On the left we have the number of milliseconds\footnote{When working with timestamps we should use nanoseconds.}. Thus we have produced five-minute bars. \item \code{xbar} is defined in k as follows: \begin{snippet}{q} q)xbar k){x*y div x:$[16h=abs[@x];"j"$x;x]} \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{q-sql select: \textbf{xbar} in group-by specifications (\textbf{by} phrase)} \begin{itemize} \item \code{xbar} is particularly useful in group-by specifications. \item Thus, if we want to group-by 15-minute intervals, we can apply it like so: \begin{snippet}{q} select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by(15 * 60 * 1000000000)xbar(quotes[`date]+quotes[`time]) from quotes where askprice<1250f \end{snippet} \begin{snippet}{output} x | first_bidprice avg_askprice max_bidsize number_of_quotes
|--------------------------------------------------------
2016.04.07D10:15:00.000000000| 38.34 38.35 86 1 2016.04.07D11:00:00.000000000| 1241.4 1241.5 22 1 2016.04.07D12:00:00.000000000| 38.12 38.13 7 1 2016.04.07D12:15:00.000000000| 38.11 38.12 16 1 2016.04.07D13:00:00.000000000| 1238.6 1238.7 8 1 2016.04.07D15:00:00.000000000| 38.52 38.53 9 1 2016.04.07D15:15:00.000000000| 1240.9 1241 6 1 2016.04.08D10:45:00.000000000| 40.83 40.84 6 1 2016.04.08D13:45:00.000000000| 40.54 40.56 38 1 2016.04.08D15:15:00.000000000| 40.77 40.78 26 2 \end{snippet} \end{itemize} \end{frame}
\begin{frame} \frametitle{Exercise} Produce a table of last trade prices, last trade volumes, average trade prices, and average trade volumes in each one hour-long bar. \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(i)} The question is whether we want to lump different dates into the same bucket. If we do, then we could use \begin{snippet}{q} select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000*60*60)xbar time from trades \end{snippet} \begin{snippet}{output} time | lastp lastv avgp avgv
| -------------------------
10:00:00.000| 1240 3 639.15 3.5 11:00:00.000| 41.43 2 29.98 1.5 12:00:00.000| 18.49 1 511.548 2.6 13:00:00.000| 18.85 5 28.445 4 14:00:00.000| 18.35 1 629.775 1 15:00:00.000| 19.05 2 29.915 2.5 16:00:00.000| 1258.7 1 1258.3 1 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Solution~(ii)} More likely than not, we want to distinguish the different dates, so we apply \code{xbar} as follows: \begin{snippet}{q} q)select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000000000*60*60)xbar date+time from trades \end{snippet} \begin{snippet}{output} date | lastp lastv avgp avgv
| --------------------------
2016.04.07D10:00:00.000000000| 38.3 4 38.3 4 2016.04.07D12:00:00.000000000| 1240.5 1 1240.1 3.5 2016.04.07D13:00:00.000000000| 18.85 5 28.445 4 2016.04.07D14:00:00.000000000| 1241.2 1 1241.2 1 2016.04.08D10:00:00.000000000| 1240 3 1240 3 2016.04.08D11:00:00.000000000| 18.53 1 18.53 1 2016.04.08D12:00:00.000000000| 18.49 1 25.84667 2 2016.04.08D15:00:00.000000000| 40.78 3 40.78 3 2016.04.11D11:00:00.000000000| 41.43 2 41.43 2 2016.04.11D14:00:00.000000000| 18.35 1 18.35 1 2016.04.11D15:00:00.000000000| 19.05 2 19.05 2 2016.04.11D16:00:00.000000000| 1258.7 1 1258.3 1 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{\textbf{exec}} \begin{itemize} \item \code{exec} behaves similarly to but differently from \code{select}. Unlike \code{select}, it does not return tables. \item The following \begin{snippet}{q} exec from quotes \end{snippet} will return the first row of the table as a dictionary: \begin{snippet}{output} code | `CLM16 date | 2016.04.08 time | 15:21:43.786 bidprice| 40.76 bidsize | 3h askprice| 40.77 asksize | 28h mktflag | `E \end{snippet} \item The following will return a dictionary of two lists: \begin{snippet}{q} exec date, time from quotes \end{snippet} \begin{snippet}{output} date| 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.0.. time| 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:.. \end{snippet} \item Whereas the following will simply return a list: \begin{snippet}{q} exec date from quotes \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{\textbf{update}} \begin{itemize} \item \code{update}, whose syntax is similar to that of \code{select}, can be used to update tables. \item For example, to replace all \code{code}s with \code{`UNKNOWN} in the table \code{quotes}, we can use the following: \begin{snippet}{q} update code:`UNKNOWN from quotes \end{snippet} \item To replace the code \code{`CLM16} with \code{`crudeoil} we can use \begin{snippet}{q} update code:`crudeoil from trades where code=`CLM16 \end{snippet} \item These q-sql statements will return new tables. To update the table in place, reference it by name (\code{`quotes}) instead of using the variable \code{quotes} in the query: \begin{snippet}{q} q)update code:`crudeoil from `trades where code=`CLM16 `trades \end{snippet} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Use \code{update} to set all instances of \code{bidprice} where \code{bidprice} is greater than \code{1000f} to \code{0n}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} q)update bidprice:0n from `quotes where bidprice>1000f `quotes \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Use \code{update} to set all \code{bidprice}s in the table \code{quotes} to an increasing sequence of floats: \code{0f} \code{1f} \code{2f} \code{3f}... \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} update bidprice:`float$til count quotes from `quotes \end{snippet} or \begin{snippet}{q} update bidprice:0f+til count quotes from `quotes \end{snippet} or \begin{snippet}{q} update bidprice:`float$i from `quotes \end{snippet} or \begin{snippet}{q} update bidprice:0f+i from `quotes \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Recall the meaning of the column attributes} \begin{itemize} \item \code{`s} --- \defn{sorted}: \code{`s#2 2 3}. The elements of the list must be sorted. By marking the list as sorted, binary search can be performed instead of fully scanning the column, speeding up many operations. \item \code{`u} --- \defn{unique}: \code{`u#2 4 5}. The elements of the list must be unique. This attribute creates a unique hash table in the background, allowing constant time lookup of elements. \item \code{`p} --- \defn{parted}: \code{`p#2 2 1}. All elements of the same value must occur in one sequential block. By storing the same items continuously on-disc reading using standard hard drives is much faster. \item \code{`g} --- \defn{grouped}: \code{`g#2 1 2}. This attribute is closest to indexing in standard databases. A lookup table from each distinct value in the table is created to map to the positions where that value occurs. This enables much quicker lookup of the entries. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Setting and unsetting the column attributes} We can use \code{update} to set and unset the column attributes: \begin{snippet}{q} q)meta quotes c | t f a
| -----
code | s date | d time | t bidprice| f bidsize | h askprice| f asksize | h mktflag | s \end{snippet} \begin{snippet}{q} q)update `s#date from `quotes `quotes \end{snippet} \begin{snippet}{q} q)meta quotes c | t f a
| -----
code | s date | d s time | t bidprice| f bidsize | h askprice| f asksize | h mktflag | s \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Set the grouped attribute on the column \code{code} and the sorted attribute on the column \code{date} of the \code{trades} table. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} q)meta trades c | t f a
| -----
code | s date | d time | t price | f volume | h mktflag | s comments| C \end{snippet} \begin{snippet}{q} q)update `g#code,`s#date from `trades `trades \end{snippet} \begin{snippet}{q} q)meta trades c | t f a
| -----
code | s g date | d s time | t price | f volume | h mktflag | s comments| C \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Remove the attributes from the \code{code} and \code{date} columns of the \code{trades} table. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} q)meta trades c | t f a
| -----
code | s g date | d s time | t price | f volume | h mktflag | s comments| C \end{snippet} \begin{snippet}{q} q)update `#code,`#date from `trades `trades \end{snippet} \begin{snippet}{q} q)meta trades c | t f a
| -----
code | s date | d time | t price | f volume | h mktflag | s comments| C \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{\textbf{delete}} \begin{itemize} \item \code{delete} also has a similar syntax to that of \code{update}. \item We can use the following to delete all quotes for \code{`GCM16}: \begin{snippet}{q} delete from `quotes where code=`GCM16 \end{snippet} \item Notice that we have again referred to the table by its name, \code{`quotes}, in order to change it in place. \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{Exercise} Delete all rows from \code{quotes} where the \code{askprice} is greater than \code{1000f}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} q)delete from `quotes where askprice>1000f `quotes \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{Deleting columns} \code{delete} can be used to delete columns as well as rows: \begin{snippet}{q} delete bidprice from quotes \end{snippet} This will return a copy. To modify the table in place, use the symbol reference \code{`quotes} instead of \code{quotes}. \end{frame}
\begin{frame} \frametitle{Exercise} Delete the columns \code{mktflag} and \code{comments} from the table \code{trades}. \end{frame}
\begin{frame}[fragile] \frametitle{Solution} \begin{snippet}{q} q)delete mktflag,comments from `trades `trades \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{The order of evaluation~(i)} We could use \code{0N!} to investigate the order of evaluation in q: \begin{snippet}{q} q)0N!1;0N!2;0N!3 1 2 3 3 \end{snippet} \begin{snippet}{q} q)(0N!1;0N!2;0N!3) 3 2 1 1 2 3 \end{snippet} \begin{snippet}{q} q)add:{x+y+z};add[0N!1;0N!2;0N!3] 3 2 1 6 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{The order of evaluation~(ii)} \begin{snippet}{q} q)select 0N!date,0N!time,0N!askprice from quotes 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 .. 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:00.051.. 38.35 1241.5 38.13 38.12 1238.7 38.53 1241 40.84 40.56 40.79 40.77 date time askprice
2016.04.07 10:15:00.010 38.35 2016.04.07 11:02:16.663 1241.5 2016.04.07 12:05:00.303 38.13 2016.04.07 12:22:00.486 38.12 2016.04.07 13:00:00.205 1238.7 2016.04.07 15:00:00.051 38.53 2016.04.07 15:20:02.224 1241 2016.04.08 10:53:00.002 40.84 2016.04.08 13:56:30.070 40.56 2016.04.08 15:20:02.000 40.79 2016.04.08 15:21:43.786 40.77 \end{snippet} \end{frame}
\begin{frame}[fragile] \frametitle{The order of evaluation~(iii)} \begin{snippet}{q} q)select 0N!date,0N!time,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7 00000001111b 1011b 2016.04.08 2016.04.08 2016.04.08 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77 date time askprice
2016.04.08 10:53:00.002 40.84 2016.04.08 15:20:02.000 40.79 2016.04.08 15:21:43.786 40.77 \end{snippet} From this it is clear that the order of evaluation is as follows: \begin{itemize} \item \code{date>2016.04.07} \item \code{askprice>40.7} \item \code{date} \item \code{time} \item \code{askprice} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{The order of evaluation~(iv)} \begin{snippet}{q} q)select 0N!time,0N!askprice by 0N!date from quotes where 0N!date>2016.04.07,0N!askprice>40.7 00000001111b 1011b 2016.04.08 2016.04.08 2016.04.08 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77 date | time askprice
| --------------------------------------------------------
2016.04.08| 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77 \end{snippet} From this it is clear that the order of evaluation is as follows: \begin{itemize} \item \code{date>2016.04.07} \item \code{askprice>40.7} \item \code{by date} \item \code{time} \item \code{askprice} \end{itemize} \end{frame}
\begin{frame}[fragile] \frametitle{The order of evaluation~(v)} \begin{snippet}{q} q)select 0N!time by 0N!date,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7 00000001111b 1011b 2016.04.08 2016.04.08 2016.04.08 40.84 40.79 40.77 ,10:53:00.002 ,15:20:02.000 ,15:21:43.786 date askprice| time
| ------------
2016.04.08 40.77 | 15:21:43.786 2016.04.08 40.79 | 15:20:02.000 2016.04.08 40.84 | 10:53:00.002 \end{snippet} From this it is clear that the order of evaluation is as follows: \begin{itemize} \item \code{date>2016.04.07} \item \code{askprice>40.7} \item \code{by date} \item \code{by askprice} \item \code{time} \end{itemize} \end{frame}