Programming/Kdb/Q-sql
Example table schemas
Consider the following quotes table schema (empty table):
quotes:([] code:`symbol$(); date:`date$(); time:`time$(); bidprice:`float$(); bidsize:`short$(); askprice:`float$(); asksize:`short$(); mktflag:`symbol$())
...and the following trades table schema (empty table):
trades:([] code:`symbol$(); date:`date$(); time:`time$(); price:`float$(); volume:`short$(); mktflag:`symbol$(); comments:())
Insert
Let us add the first row to the quotes table using an insert:
insert[`quotes;(`CLM16;2016.04.07;10:15:00.010;38.34;86h;38.35;3h;`E)]
Notice that when we want to modify a table, we usually provide a reference to it as a symbol, `quotes, instead of passing it as a variable, quotes.
The values of the new row are provided in order, as a list.
We can now examine the table quotes:
quotes
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E
We can add the second row in a similar manner:
insert[`quotes;(`GCM16;2016.04.07;11:02:16.663;1241.4;22h;1241.5;1h;`E)]
Let us again examine the table quotes:
quotes
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E GCM16 2016.04.07 11:02:16.663 1241.4 22 1241.5 1 E
Inserting multiple rows at once
One can insert multiple rows to the quotes table using the followning syntax:
.[`quotes;();,;( (`CLM16; 2016.04.07; 12:05:00.303; 38.12 ; 7h ; 38.13 ; 13h; `E); (`CLM16; 2016.04.07; 12:22:00.486; 38.11 ; 16h; 38.12 ; 8h ; `E); (`GCM16; 2016.04.07; 13:00:00.205; 1238.6; 8h ; 1238.7; 7h ; `E); (`CLM16; 2016.04.07; 15:00:00.051; 38.52 ; 9h ; 38.53 ; 18h; `E); (`GCM16; 2016.04.07; 15:20:02.224; 1240.9; 6h ; 1241f ; 1h ; `E); (`CLM16; 2016.04.08; 10:53:00.002; 40.83 ; 6h ; 40.84 ; 66h; `E); (`CLM16; 2016.04.08; 13:56:30.070; 40.54 ; 38h; 40.56 ; 58h; `E); (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E); (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]
Let us again examine the table quotes:
quotes
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E GCM16 2016.04.07 11:02:16.663 1241.4 22 1241.5 1 E CLM16 2016.04.07 12:05:00.303 38.12 7 38.13 13 E CLM16 2016.04.07 12:22:00.486 38.11 16 38.12 8 E GCM16 2016.04.07 13:00:00.205 1238.6 8 1238.7 7 E CLM16 2016.04.07 15:00:00.051 38.52 9 38.53 18 E GCM16 2016.04.07 15:20:02.224 1240.9 6 1241 1 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E CLM16 2016.04.08 13:56:30.070 40.54 38 40.56 58 E CLM16 2016.04.08 15:20:02.000 40.77 26 40.79 44 E CLM16 2016.04.08 15:21:43.786 40.76 3 40.77 28 E
We have just used dot (.) for dyadic functions. Quoting Q for Mortals, the general form of functional . for dyadic functions is
.[L;I;f;y]
L is a mapping with explicit domain, I is a list in the domain of L, f is a dyadic function and y is an atom or list of the proper shape. For a list, the result is the item-wise application to the items of L indexed at depth by I, of f and the parameter y. Over the subdomain I, the map output becomes
(L . I) f y / binary operator f[L . I; y] / dyadic function
Thus, if
L:(100 200;300 400 500 600;600 700);
then
.[L;1 2;+;42]
amounts to copying L, adding 42 to the [1;2]nd element of the copy, and returning that copy:
(100j, 200j;300j, 400j, 542j, 600j;600j 700j)
Similarly,
.[L;(1;(0;2;3));+;42]
amounts to copying L, adding 42 to the [1;0]th, [1;2]nd, and [1;3]rd elements of the copy, and returning that copy:
(100j, 200j;342j, 400j, 542j, 642j;600j, 700j)
Whereas
.[L;(1;(0;2;3));+;(42;45;44)]
amounts to copying L, adding 42 to the [1; 0]th, 45 to the [1; 2]nd, and 44 to the [1; 3]rd element of the copy, and returning that copy:
(100j, 200j;342j, 400j, 545j, 644j;600j, 700j)
And
.[L;();+;42]
amounts to copying L, adding 42 to all elements of the copy, and returning that copy:
(142j, 242j;342j, 442j, 542j, 642j;642j, 742j)
In none of the cases was the original L modified. If we wanted to modify it, we would have passed a name reference, as a symbol, instead:
.[`L;();+;42]
`L
L <pre> <pre> (142j, 242j;342j, 442j, 542j, 642j;642j, 742j)
Along the same lines, we can use . to extend a list (remember that , concatenates):
scientists:`newton`leibniz; .[`scientists;();,;`wiener`khinchin`kolmogorov]; scientists
`newton`leibniz`wiener`khinchin`kolmogorov
Just as we used to extend a table.
Deleting rows from a table
To delete all rows from the quotes table we can use the following code:
q)delete from `quotes
Notice that we referred to the quotes table by its symbol name,
`quotes
, instead of providing a variable, quotes. This is generally the case when we want to modify a table in place.
We can check that the quotes table is now indeed empty:
q)count quotes 0 q)quotes code date time bidprice bidsize askprice asksize mktflag --------------------------------------------------------
Repopulating quotes
Let us repopulate quotes again:
quotes:([]code:`symbol$();date:`date$();time:`time$();bidprice:`float$();bidsize:`short$();askprice:`float$();asksize:`short$();mktflag:`symbol$()); .[`quotes;();,;( (`CLM16; 2016.04.07; 10:15:00.010; 38.34 ; 86h; 38.35 ; 3h ; `E); (`GCM16; 2016.04.07; 11:02:16.663; 1241.4; 22h; 1241.5; 1h ; `E); (`CLM16; 2016.04.07; 12:05:00.303; 38.12 ; 7h ; 38.13 ; 13h; `E); (`CLM16; 2016.04.07; 12:22:00.486; 38.11 ; 16h; 38.12 ; 8h ; `E); (`GCM16; 2016.04.07; 13:00:00.205; 1238.6; 8h ; 1238.7; 7h ; `E); (`CLM16; 2016.04.07; 15:00:00.051; 38.52 ; 9h ; 38.53 ; 18h; `E); (`GCM16; 2016.04.07; 15:20:02.224; 1240.9; 6h ; 1241f ; 1h ; `E); (`CLM16; 2016.04.08; 10:53:00.002; 40.83 ; 6h ; 40.84 ; 66h; `E); (`CLM16; 2016.04.08; 13:56:30.070; 40.54 ; 38h; 40.56 ; 58h; `E); (`CLM16; 2016.04.08; 15:20:02.000; 40.77 ; 26h; 40.79 ; 44h; `E); (`CLM16; 2016.04.08; 15:21:43.786; 40.76 ; 3h ; 40.77 ; 28h; `E))]
The metadata of the trades table before we populate it
Let us examine the metadata of the trades table before we populate it:
meta trades
c | t f a --------| ----- code | s date | d time | t price | f volume | h mktflag | s comments|
Note that
type trades[`comments]
is 0h, so this column is a mixed list (remember that table columns are all lists). It appeared in the schema without a type cast:
trades : ([] ... volume:`short$(); mktflag:`symbol$(); comments:())
Exercise
Use dot (.) to populate the trades table with the following rows:
(`CLM16; 2016.04.07; 10:20:00.329; 38.3 ; 4h; `E; "NYMEX/CME Group trade"); (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; ""); (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; ""); (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; ""); (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; ""); (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; ""); (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class"); (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade"); (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; ""); (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; ""); (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; ""); (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; "")
Solution
.[`trades;();,;( (`CLM16; 2016.04.07; 10:20:00.329; 38.3 ; 4h; `E; "NYMEX/CME Group trade"); (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; ""); (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; ""); (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; ""); (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; ""); (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; ""); (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class"); (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class"); (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade"); (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade"); (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; ""); (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; ""); (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; ""); (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]
trades
code date time price volume mktflag comments .. -----------------------------------------------------------------------------.. CLM16 2016.04.07 10:20:00.329 38.3 4 E "NYMEX/CME Group trade" .. GCM16 2016.04.07 12:00:00.055 1239.7 6 E "" .. GCM16 2016.04.07 12:37:02.106 1240.5 1 E "" .. CLM16 2016.04.07 13:00:00.128 38.04 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.07 13:22:05.617 18.85 5 E "" .. GCM16 2016.04.07 14:35:01.241 1241.2 1 E "" .. GCM16 2016.04.08 10:13:01.048 1240 3 E "" .. VXK16 2016.04.08 11:34:53.417 18.53 1 E "Transaction represents a.. CLM16 2016.04.08 12:00:00.227 40.61 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.08 12:44:00.684 18.44 2 E "Transaction represents a.. VXK16 2016.04.08 12:45:33.130 18.49 1 E "Transaction represents a.. CLM16 2016.04.08 15:20:02.000 40.78 3 E "NYMEX/CME Group trade" .. CLM16 2016.04.11 11:00:00.105 41.43 2 E "NYMEX/CME Group trade" .. VXK16 2016.04.11 14:00:00.829 18.35 1 E "" .. VXK16 2016.04.11 15:14:58.775 19.05 2 E "" .. GCM16 2016.04.11 16:00:00.044 1257.9 1 E "" .. GCM16 2016.04.11 16:28:34.311 1258.7 1 E "" ..
The metadata of the trades table after we have populated it
Let us examine the metadata of the trades table before we populate it:
meta trades
c | t f a --------| ----- code | s date | d time | t price | f volume | h mktflag | s comments| C
Note that
type trades[`comments]
is still 0h, but the metadata above tells us that comments is a column of strings. Strings are themselves lists
trades[`comments][0][0]
"N"
and the only way to declare a list of lists is through a mixed list, there is no other way we could have declared this column in our schema.
Q-sql select
Consider the simplest possible form of q-sql select:
select from quotes
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E GCM16 2016.04.07 11:02:16.663 1241.4 22 1241.5 1 E CLM16 2016.04.07 12:05:00.303 38.12 7 38.13 13 E CLM16 2016.04.07 12:22:00.486 38.11 16 38.12 8 E GCM16 2016.04.07 13:00:00.205 1238.6 8 1238.7 7 E CLM16 2016.04.07 15:00:00.051 38.52 9 38.53 18 E GCM16 2016.04.07 15:20:02.224 1240.9 6 1241 1 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E CLM16 2016.04.08 13:56:30.070 40.54 38 40.56 58 E CLM16 2016.04.08 15:20:02.000 40.77 26 40.79 44 E CLM16 2016.04.08 15:21:43.786 40.76 3 40.77 28 E
We have selected all rows from the table.
Exercise
Select all rows from the trades table.
Solution
select from trades
code date time price volume mktflag comments .. -----------------------------------------------------------------------------.. CLM16 2016.04.07 10:20:00.329 38.3 4 E "NYMEX/CME Group trade" .. GCM16 2016.04.07 12:00:00.055 1239.7 6 E "" .. GCM16 2016.04.07 12:37:02.106 1240.5 1 E "" .. CLM16 2016.04.07 13:00:00.128 38.04 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.07 13:22:05.617 18.85 5 E "" .. GCM16 2016.04.07 14:35:01.241 1241.2 1 E "" .. GCM16 2016.04.08 10:13:01.048 1240 3 E "" .. VXK16 2016.04.08 11:34:53.417 18.53 1 E "Transaction represents a.. CLM16 2016.04.08 12:00:00.227 40.61 3 E "NYMEX/CME Group trade" .. VXK16 2016.04.08 12:44:00.684 18.44 2 E "Transaction represents a.. VXK16 2016.04.08 12:45:33.130 18.49 1 E "Transaction represents a.. CLM16 2016.04.08 15:20:02.000 40.78 3 E "NYMEX/CME Group trade" .. CLM16 2016.04.11 11:00:00.105 41.43 2 E "NYMEX/CME Group trade" .. VXK16 2016.04.11 14:00:00.829 18.35 1 E "" .. VXK16 2016.04.11 15:14:58.775 19.05 2 E "" .. GCM16 2016.04.11 16:00:00.044 1257.9 1 E "" .. GCM16 2016.04.11 16:28:34.311 1258.7 1 E "" ..
Q-sql select: selecting columns
We may list the columns that we want to select, in the order in which we want them to appear in the resulting table:
select date,time,code,asksize,askprice from quotes
date time code asksize askprice ---------------------------------------------- 2016.04.07 10:15:00.010 CLM16 3 38.35 2016.04.07 11:02:16.663 GCM16 1 1241.5 2016.04.07 12:05:00.303 CLM16 13 38.13 2016.04.07 12:22:00.486 CLM16 8 38.12 2016.04.07 13:00:00.205 GCM16 7 1238.7 2016.04.07 15:00:00.051 CLM16 18 38.53 2016.04.07 15:20:02.224 GCM16 1 1241 2016.04.08 10:53:00.002 CLM16 66 40.84 2016.04.08 13:56:30.070 CLM16 58 40.56 2016.04.08 15:20:02.000 CLM16 44 40.79 2016.04.08 15:21:43.786 CLM16 28 40.77
By default, the output is not saved, but we can set it to a variable:
new_quotes_table:select date,time,code,asksize,askprice from quotes
Exercise
Select the date, time, volume, and price (in that order) from the trades table.
Solution
select date,time,volume,price from trades
date time volume price ------------------------------------- 2016.04.07 10:20:00.329 4 38.3 2016.04.07 12:00:00.055 6 1239.7 2016.04.07 12:37:02.106 1 1240.5 2016.04.07 13:00:00.128 3 38.04 2016.04.07 13:22:05.617 5 18.85 2016.04.07 14:35:01.241 1 1241.2 2016.04.08 10:13:01.048 3 1240 2016.04.08 11:34:53.417 1 18.53 2016.04.08 12:00:00.227 3 40.61 2016.04.08 12:44:00.684 2 18.44 2016.04.08 12:45:33.130 1 18.49 2016.04.08 15:20:02.000 3 40.78 2016.04.11 11:00:00.105 2 41.43 2016.04.11 14:00:00.829 1 18.35 2016.04.11 15:14:58.775 2 19.05 2016.04.11 16:00:00.044 1 1257.9 2016.04.11 16:28:34.311 1 1258.7
Q-sql select: selecting columns under new names
We may give the selected columns new names in the result:
select d:date,t:time,code,size:asksize,price:askprice from quotes
d t code size price ----------------------------------------- 2016.04.07 10:15:00.010 CLM16 3 38.35 2016.04.07 11:02:16.663 GCM16 1 1241.5 2016.04.07 12:05:00.303 CLM16 13 38.13 2016.04.07 12:22:00.486 CLM16 8 38.12 2016.04.07 13:00:00.205 GCM16 7 1238.7 2016.04.07 15:00:00.051 CLM16 18 38.53 2016.04.07 15:20:02.224 GCM16 1 1241 2016.04.08 10:53:00.002 CLM16 66 40.84 2016.04.08 13:56:30.070 CLM16 58 40.56 2016.04.08 15:20:02.000 CLM16 44 40.79 2016.04.08 15:21:43.786 CLM16 28 40.77
Exercise
Select the date, time, volume, and price (in that order) from the trades table but under the names d, t, v, and p.
Solution
select d:date,t:time,v:volume,p:price from trades
d t v p -------------------------------- 2016.04.07 10:20:00.329 4 38.3 2016.04.07 12:00:00.055 6 1239.7 2016.04.07 12:37:02.106 1 1240.5 2016.04.07 13:00:00.128 3 38.04 2016.04.07 13:22:05.617 5 18.85 2016.04.07 14:35:01.241 1 1241.2 2016.04.08 10:13:01.048 3 1240 2016.04.08 11:34:53.417 1 18.53 2016.04.08 12:00:00.227 3 40.61 2016.04.08 12:44:00.684 2 18.44 2016.04.08 12:45:33.130 1 18.49 2016.04.08 15:20:02.000 3 40.78 2016.04.11 11:00:00.105 2 41.43 2016.04.11 14:00:00.829 1 18.35 2016.04.11 15:14:58.775 2 19.05 2016.04.11 16:00:00.044 1 1257.9 2016.04.11 16:28:34.311 1 1258.7
Q-sql select: operations on columns
Perhaps more interestingly, selected columns may be functions of / results of operations on the existing ones:
select dt:date+time,code,size:asksize,price:askprice from quotes
dt code size price ----------------------------------------------- 2016.04.07D10:15:00.010000000 CLM16 3 38.35 2016.04.07D11:02:16.663000000 GCM16 1 1241.5 2016.04.07D12:05:00.303000000 CLM16 13 38.13 2016.04.07D12:22:00.486000000 CLM16 8 38.12 2016.04.07D13:00:00.205000000 GCM16 7 1238.7 2016.04.07D15:00:00.051000000 CLM16 18 38.53 2016.04.07D15:20:02.224000000 GCM16 1 1241 2016.04.08D10:53:00.002000000 CLM16 66 40.84 2016.04.08D13:56:30.070000000 CLM16 58 40.56 2016.04.08D15:20:02.000000000 CLM16 44 40.79 2016.04.08D15:21:43.786000000 CLM16 28 40.77
Remember that columns are lists, so the above + works on two lists:
q)quotes[`date] 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 .. q)quotes[`time] 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:00.051.. q)quotes[`date]+quotes[`time] 2016.04.07D10:15:00.010000000 2016.04.07D11:02:16.663000000 2016.04.07D12:05:..
Exercise
The best bid and best ask (offer) prices are denoted by and , respectively.
The midprice is the average of the best bid and best ask prices:
Write a q-sql select query selecting the timestamp (date + time) and the midprice.
Solution
select dt:date+time,mid:.5*bidprice+askprice from quotes
dt mid ------------------------------------- 2016.04.07D10:15:00.010000000 38.345 2016.04.07D11:02:16.663000000 1241.45 2016.04.07D12:05:00.303000000 38.125 2016.04.07D12:22:00.486000000 38.115 2016.04.07D13:00:00.205000000 1238.65 2016.04.07D15:00:00.051000000 38.525 2016.04.07D15:20:02.224000000 1240.95 2016.04.08D10:53:00.002000000 40.835 2016.04.08D13:56:30.070000000 40.55 2016.04.08D15:20:02.000000000 40.78 2016.04.08D15:21:43.786000000 40.765
Exercise
The microprice is often more useful than the midprice, since it weights the best bid and best ask prices with the volumes posted at the best bid and best ask prices:
where and represent the volumes posted at the best bid and best ask prices, respectively.
Modify your query to add the microprice to it.
Solution
select dt:date+time,mid:.5*bidprice+askprice,microprice:((bidsize*askprice)+asksize*bidprice)%bidsize+asksize from quotes
dt mid microprice ------------------------------------------------ 2016.04.07D10:15:00.010000000 38.345 38.34966 2016.04.07D11:02:16.663000000 1241.45 1241.496 2016.04.07D12:05:00.303000000 38.125 38.1235 2016.04.07D12:22:00.486000000 38.115 38.11667 2016.04.07D13:00:00.205000000 1238.65 1238.653 2016.04.07D15:00:00.051000000 38.525 38.52333 2016.04.07D15:20:02.224000000 1240.95 1240.986 2016.04.08D10:53:00.002000000 40.835 40.83083 2016.04.08D13:56:30.070000000 40.55 40.54792 2016.04.08D15:20:02.000000000 40.78 40.77743 2016.04.08D15:21:43.786000000 40.765 40.76097
A challenge
So what if we want to return only the first character from each code?
select dt:date+time,shortcode:code[0],size:asksize,price:askprice from quotes
does not give the desired result:
dt shortcode size price --------------------------------------------------- 2016.04.07D10:15:00.010000000 CLM16 3 38.35 2016.04.07D11:02:16.663000000 CLM16 1 1241.5 2016.04.07D12:05:00.303000000 CLM16 13 38.13 2016.04.07D12:22:00.486000000 CLM16 8 38.12 2016.04.07D13:00:00.205000000 CLM16 7 1238.7 2016.04.07D15:00:00.051000000 CLM16 18 38.53 2016.04.07D15:20:02.224000000 CLM16 1 1241 2016.04.08D10:53:00.002000000 CLM16 66 40.84 2016.04.08D13:56:30.070000000 CLM16 58 40.56 2016.04.08D15:20:02.000000000 CLM16 44 40.79 2016.04.08D15:21:43.786000000 CLM16 28 40.77
This is because the indexing, ...[0], is applied to the entire list quotes[`code], and we get a symbol scalar, `CLM16. This scalar is returned for each row in the result, as you can see above.
Indeed, this is what happens when we include scalars among selected columns:
select dt:date+time,shortcode:123,size:asksize,price:askprice from quotes
dt shortcode size price --------------------------------------------------- 2016.04.07D10:15:00.010000000 123 3 38.35 2016.04.07D11:02:16.663000000 123 1 1241.5 2016.04.07D12:05:00.303000000 123 13 38.13 2016.04.07D12:22:00.486000000 123 8 38.12 2016.04.07D13:00:00.205000000 123 7 1238.7 2016.04.07D15:00:00.051000000 123 18 38.53 2016.04.07D15:20:02.224000000 123 1 1241 2016.04.08D10:53:00.002000000 123 66 40.84 2016.04.08D13:56:30.070000000 123 58 40.56 2016.04.08D15:20:02.000000000 123 44 40.79 2016.04.08D15:21:43.786000000 123 28 40.77
Still, what if we want to return the first character of each code?
One way to do this is by applying a function to each element of the list which represents the resulting column:
select dt:date+time,shortcode:{`$1#string x}each code,size:asksize,price:askprice from quotes
dt shortcode size price --------------------------------------------------- 2016.04.07D10:15:00.010000000 C 3 38.35 2016.04.07D11:02:16.663000000 G 1 1241.5 2016.04.07D12:05:00.303000000 C 13 38.13 2016.04.07D12:22:00.486000000 C 8 38.12 2016.04.07D13:00:00.205000000 G 7 1238.7 2016.04.07D15:00:00.051000000 C 18 38.53 2016.04.07D15:20:02.224000000 G 1 1241 2016.04.08D10:53:00.002000000 C 66 40.84 2016.04.08D13:56:30.070000000 C 58 40.56 2016.04.08D15:20:02.000000000 C 44 40.79 2016.04.08D15:21:43.786000000 C 28 40.77
Instead of the keyword each, we could have used the ' adverb (don't forget the parentheses around the verb–adverb pair):
select dt:date+time,shortcode:({`$1#string x}')code,size:asksize,price:askprice from quotes
dt shortcode size price --------------------------------------------------- 2016.04.07D10:15:00.010000000 C 3 38.35 2016.04.07D11:02:16.663000000 G 1 1241.5 2016.04.07D12:05:00.303000000 C 13 38.13 2016.04.07D12:22:00.486000000 C 8 38.12 2016.04.07D13:00:00.205000000 G 7 1238.7 2016.04.07D15:00:00.051000000 C 18 38.53 2016.04.07D15:20:02.224000000 G 1 1241 2016.04.08D10:53:00.002000000 C 66 40.84 2016.04.08D13:56:30.070000000 C 58 40.56 2016.04.08D15:20:02.000000000 C 44 40.79 2016.04.08D15:21:43.786000000 C 28 40.77
This works since
{`$1#string x}each quotes[`code]
and
({`$1#string x}')quotes[`code]
give the desired list
`C`G`C`C`G`C`G`C`C`C`C
which is what the above select returns for that column (remember: columns are lists).
Exercise
Select from the quotes table cf, which is a symbol consisting of two characters: the first character of the code and the first character of the mktflag.
Solution
We can form (code,mktflag) pairs with code,'mktflag and then apply a function to each (') such pair:
select cf:({`$(1#string x[0]),1#string x[1]}')(code,'mktflag) from quotes
cf -- CE GE CE CE GE CE GE CE CE CE CE
However, we can do without the "pairing" code code,'mktflag. While the keyword each does not work on functions of multiple arguments and the following throws 'type,
select cf:{`$(1#string x),1#string y}each[code;mktflag] from quotes
Each (') does work on functions of multiple arguments:
select cf:{`$(1#string x),1#string y}'[code;mktflag] from quotes
The following, somewhat less terse, solution works too. But here we iterate twice, whereas we could iterate only once, as in the previous solution:
select cf:`$(({1#string x}each code),'{1#string x}each mktflag) from quotes
Q-sql select: where specifications (constraints)
As in classical SQL, in q-sql we can add to our select the where specifications (constraints). Thus, if we want to return only those columns where price is greater than 1000f...
select date,time,price:askprice from quotes where price>1000f
Oh, wait, this returns an error:
'price [0] select date,time,price:askprice from quotes where price>1000f ^
That's because the constraints are applied first, before one of the resulting columns is given the name price. We must apply the constraint to askprice in the original table instead:
select date,time,price:askprice from quotes where askprice>1000f
date time price ------------------------------ 2016.04.07 11:02:16.663 1241.5 2016.04.07 13:00:00.205 1238.7 2016.04.07 15:20:02.224 1241
We can add multiple constraints separated by commas:
select date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
date time price ------------------------------ 2016.04.07 13:00:00.205 1238.7 2016.04.07 15:20:02.224 1241
NB! Unlike classical SQL, the order of the constraints matters! The constraints are applied in order. From the point of view of performance, you should apply the most "expensive" constraints, which shrink the result the most, first.
In particular, when dealing with partitioned tables, include the constraint on the partitioning column (usually date) first. Since partitioned tables are stored on disc in directories by date, this constraint allows kdb+/q to zoom in on the right directory. If it is not the first constraint, an expensive scan of the entire database on disc may result.
Q-sql select: indices
Tables are stored in deterministic order. If we want to return the indices of the rows returned by the query, we can add i to the select:
select i,date,time,price:askprice from quotes where askprice>1000f,time>=12:00:00.000
x date time price -------------------------------- 4 2016.04.07 13:00:00.205 1238.7 6 2016.04.07 15:20:02.224 1241
Exercise
Select from trades those rows where price is closer to the smallest integer greater than it than the greatest integer that is less than or equal to the price.
Solution
select from trades where ((ceiling price)-price)<price-floor price
code date time price volume mktflag comments .. -----------------------------------------------------------------------------.. GCM16 2016.04.07 12:00:00.055 1239.7 6 E "" .. VXK16 2016.04.07 13:22:05.617 18.85 5 E "" .. VXK16 2016.04.08 11:34:53.417 18.53 1 E "Transaction represents a.. CLM16 2016.04.08 12:00:00.227 40.61 3 E "NYMEX/CME Group trade" .. CLM16 2016.04.08 15:20:02.000 40.78 3 E "NYMEX/CME Group trade" .. GCM16 2016.04.11 16:00:00.044 1257.9 1 E "" .. GCM16 2016.04.11 16:28:34.311 1258.7 1 E "" ..
The following solution is terser (and involves only a call to floor, no call to ceiling):
select from trades where (price-floor price)>0.5
Q-sql select: more complex constraints
More complex constraints can be formed using boolean operations:
select i,date,time,price:askprice from quotes where (((askprice>1000f)or(time>=12:00:00.000))and(time <= 15:30:00.000))
x date time price --------------------------------- 1 2016.04.07 11:02:16.663 1241.5 2 2016.04.07 12:05:00.303 38.13 3 2016.04.07 12:22:00.486 38.12 4 2016.04.07 13:00:00.205 1238.7 5 2016.04.07 15:00:00.051 38.53 6 2016.04.07 15:20:02.224 1241 8 2016.04.08 13:56:30.070 40.56 9 2016.04.08 15:20:02.000 40.79 10 2016.04.08 15:21:43.786 40.77
Remember that constraints also operate on lists (columns) and result in boolean lists:
((quotes[`askprice]>1000f)or(quotes[`time]>=12:00:00.000))and(quotes[`time]<=15:30:00.000)
01111110111b
Exercise
Select from the quotes table those rows where the time is either less than 11 a.m. or greater than or equal to 3 p.m.
Solution
select from quotes where (time<11:00:00.000)or time>=15:00:00.000 <pre> <pre> code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E CLM16 2016.04.07 15:00:00.051 38.52 9 38.53 18 E GCM16 2016.04.07 15:20:02.224 1240.9 6 1241 1 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E CLM16 2016.04.08 15:20:02.000 40.77 26 40.79 44 E CLM16 2016.04.08 15:21:43.786 40.76 3 40.77 28 E
Don't forget the parentheses. The query won't work correctly without them:
select from quotes where time<11:00:00.000 or time>=15:00:00.000
code date time bidprice bidsize askprice asksize mktflag ----------------------------------------------------------------------- CLM16 2016.04.07 10:15:00.010 38.34 86 38.35 3 E CLM16 2016.04.08 10:53:00.002 40.83 6 40.84 66 E
(Which is not what we want.)
Q-sql select: group-by specifications (by phrase)
We can group the results by the value of a particular column:
select i,date,time,price:askprice by date from quotes where askprice>=40f
date | x date time .. ----------| -----------------------------------------------------------------.. 2016.04.07| 1 4 6 2016.04.07 2016.04.07 2016.04.07 11:02:16.663.. 2016.04.08| 7 8 9 10 2016.04.08 2016.04.08 2016.04.08 2016.04.08 10:53:00.002..
The result is a keyed table (in this case, keyed by date).
Apart from the key column, all the columns in the result are lists of lists, collecting values for each partition:
(select i,date,time,price:askprice by date from quotes where askprice>=40f)[2016.04.07][`time]
11:02:16.663 13:00:00.205 15:20:02.224
Notice that, if you omit the list of selected columns from a select with by, somewhat counterintuitively, we will, for every single column, get the last value in each bucket, instead of a list of all values in the bucket:
q)select by date from quotes where askprice>=40f date | code time bidprice bidsize askprice asksize mktflag ----------| ------------------------------------------------------------ 2016.04.07| GCM16 15:20:02.224 1240.9 6 1241 1 E 2016.04.08| CLM16 15:21:43.786 40.76 3 40.77 28 E
Group-by specifications make it easy to produce summary statistics:
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by date,code from quotes where askprice<1250f
date code | first_bidprice avg_askprice max_bidsize number_of_quotes ----------------| -------------------------------------------------------- 2016.04.07 CLM16| 38.34 38.2825 86 4 2016.04.07 GCM16| 1241.4 1240.4 22 3 2016.04.08 CLM16| 40.83 40.74 38 4
We can also group by the result of a function, e.g.:
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by{[x] `hh$x}each time from quotes where askprice<1250f
time| first_bidprice avg_askprice max_bidsize number_of_quotes ----| -------------------------------------------------------- 10 | 38.34 39.595 86 2 11 | 1241.4 1241.5 22 1 12 | 38.12 38.125 16 2 13 | 1238.6 639.63 38 2 15 | 38.52 340.2725 26 4
Although in this case the use of each is superfluous: `hh$... works just as well on lists as on scalars:
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by`hh$time from quotes where askprice<1250f
Exercise
Group the data from trades by the number of digits in the price.
Solution
We could do
select code,date,time,price,volume,mktflag,comments by count each string price from trades
price| code date .. -----| ----------------------------------------------------------------------.. 4 | `CLM16`GCM16 2016.04.0.. 5 | `CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.0.. 6 | `GCM16`GCM16`GCM16`GCM16`GCM16 2016.04.0..
but this counts the decimal point as a digit (which it isn't).
To remedy this, we could use
select code,date,time,price,volume,mktflag,comments by {count sv[""]vs["."]x} each string price from trades
price| code dat.. -----| ----------------------------------------------------------------------.. 3 | ,`CLM16 ,20.. 4 | `CLM16`VXK16`GCM16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 201.. 5 | `GCM16`GCM16`GCM16`GCM16`GCM16 201..
If we care only about the digits before the decimal point, we could use
select code,date,time,price,volume,mktflag,comments by (1+floor log[price]%log 10) from trades
x| code date .. -| --------------------------------------------------------------------------.. 2| `CLM16`CLM16`VXK16`VXK16`CLM16`VXK16`VXK16`CLM16`CLM16`VXK16`VXK16 2016.04.. 4| `GCM16`GCM16`GCM16`GCM16`GCM16`GCM16 2016.04..
xbar
Consider the following example:
q)5 xbar 23 21 56 39 10 9 81 20 45 101 15 20 20 55 35 10 5 80 20 45 100 15
For each integer in the list on the right xbar outputs a multiple of 5 that is less than or equal to it. xbar works with floats, too:
q)5 xbar 23.7 21.9 56.3 39.2 10.1 9.8 81.3 20.4 45.4 101.0 15.3 20 20 55 35 10 5 80 20 45 100 15f
And, importantly, it works with temporal types:
q)(1000*60*5)xbar 00:03:09.022 00:05:58.921 00:21:00.000 00:29:00.000 00:00:00.000 00:05:00.000 00:20:00.000 00:25:00.000
On the left we have the number of milliseconds. (When working with timestamps we should use nanoseconds.) Thus we have produced five-minute bars.
xbar is defined in k as follows:
q)xbar k){x*y div x:$[16h=abs[@x];"j"$x;x]}
Q-sql select: xbar in group-by specifications (by phrase)
xbar is particularly useful in group-by specifications.
Thus, if we want to group-by 15-minute intervals, we can apply it like so:
select first_bidprice:first bidprice,avg_askprice:avg askprice,max_bidsize:max bidsize,number_of_quotes:count i by(15 * 60 * 1000000000)xbar(quotes[`date]+quotes[`time]) from quotes where askprice<1250f
x | first_bidprice avg_askprice max_bidsize number_of_quotes -----------------------------|-------------------------------------------------------- 2016.04.07D10:15:00.000000000| 38.34 38.35 86 1 2016.04.07D11:00:00.000000000| 1241.4 1241.5 22 1 2016.04.07D12:00:00.000000000| 38.12 38.13 7 1 2016.04.07D12:15:00.000000000| 38.11 38.12 16 1 2016.04.07D13:00:00.000000000| 1238.6 1238.7 8 1 2016.04.07D15:00:00.000000000| 38.52 38.53 9 1 2016.04.07D15:15:00.000000000| 1240.9 1241 6 1 2016.04.08D10:45:00.000000000| 40.83 40.84 6 1 2016.04.08D13:45:00.000000000| 40.54 40.56 38 1 2016.04.08D15:15:00.000000000| 40.77 40.78 26 2
Exercise
Produce a table of last trade prices, last trade volumes, average trade prices, and average trade volumes in each one hour-long bar.
Solution
The question is whether we want to lump different dates into the same bucket. If we do, then we could use
select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000*60*60)xbar time from trades
time | lastp lastv avgp avgv ------------| ------------------------- 10:00:00.000| 1240 3 639.15 3.5 11:00:00.000| 41.43 2 29.98 1.5 12:00:00.000| 18.49 1 511.548 2.6 13:00:00.000| 18.85 5 28.445 4 14:00:00.000| 18.35 1 629.775 1 15:00:00.000| 19.05 2 29.915 2.5 16:00:00.000| 1258.7 1 1258.3 1
More likely than not, we want to distinguish the different dates, so we apply xbar as follows:
q)select lastp:last price,lastv:last volume,avgp:avg price,avgv:avg volume by (1000000000*60*60)xbar date+time from trades
date | lastp lastv avgp avgv -----------------------------| -------------------------- 2016.04.07D10:00:00.000000000| 38.3 4 38.3 4 2016.04.07D12:00:00.000000000| 1240.5 1 1240.1 3.5 2016.04.07D13:00:00.000000000| 18.85 5 28.445 4 2016.04.07D14:00:00.000000000| 1241.2 1 1241.2 1 2016.04.08D10:00:00.000000000| 1240 3 1240 3 2016.04.08D11:00:00.000000000| 18.53 1 18.53 1 2016.04.08D12:00:00.000000000| 18.49 1 25.84667 2 2016.04.08D15:00:00.000000000| 40.78 3 40.78 3 2016.04.11D11:00:00.000000000| 41.43 2 41.43 2 2016.04.11D14:00:00.000000000| 18.35 1 18.35 1 2016.04.11D15:00:00.000000000| 19.05 2 19.05 2 2016.04.11D16:00:00.000000000| 1258.7 1 1258.3 1
exec
exec behaves similarly to but differently from select. Unlike select, it does not return tables.
The following
exec from quotes
will return the first row of the table as a dictionary:
code | `CLM16 date | 2016.04.08 time | 15:21:43.786 bidprice| 40.76 bidsize | 3h askprice| 40.77 asksize | 28h mktflag | `E
The following will return a dictionary of two lists:
exec date, time from quotes
date| 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.0.. time| 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:..
Whereas the following will simply return a list:
exec date from quotes
update
update, whose syntax is similar to that of select, can be used to update tables.
For example, to replace all codes with `UNKNOWN in the table quotes, we can use the following:
update code:`UNKNOWN from quotes
To replace the code `CLM16 with `crudeoil we can use
update code:`crudeoil from trades where code=`CLM16
These q-sql statements will return new tables. To update the table in place, reference it by name (`quotes) instead of using the variable quotes in the query:
q)update code:`crudeoil from `trades where code=`CLM16 `trades
Exercise
Use update to set all instances of bidprice where bidprice is greater than 1000f to 0n.
Solution
q)update bidprice:0n from `quotes where bidprice>1000f `quotes
Exercise
Use update to set all bidprices in the table quotes to an increasing sequence of floats: 0f 1f 2f 3f...
Solution
update bidprice:`float$til count quotes from `quotes
or
update bidprice:0f+til count quotes from `quotes
or
update bidprice:`float$i from `quotes
or
update bidprice:0f+i from `quotes
Column attributes
Recall the meaning of the column attributes:
- `s — sorted: `s#2 2 3. The elements of the list must be sorted. By marking the list as sorted, binary search can be performed instead of fully scanning the column, speeding up many operations.
- `u — unique: `u#2 4 5. The elements of the list must be unique. This attribute creates a unique hash table in the background, allowing constant time lookup of elements.
- `p — parted: `p#2 2 1. All elements of the same value must occur in one sequential block. By storing the same items continuously on-disc reading using standard hard drives is much faster.
- `g — grouped: `g#2 1 2. This attribute is closest to indexing in standard databases. A lookup table from each distinct value in the table is created to map to the positions where that value occurs. This enables much quicker lookup of the entries.
We can use update to set and unset the column attributes:
q)meta quotes c | t f a --------| ----- code | s date | d time | t bidprice| f bidsize | h askprice| f asksize | h mktflag | s
q)update `s#date from `quotes `quotes
q)meta quotes c | t f a --------| ----- code | s date | d s time | t bidprice| f bidsize | h askprice| f asksize | h mktflag | s
Exercise
Set the grouped attribute on the column code and the sorted attribute on the column date of the trades table.
Solution
q)meta trades c | t f a --------| ----- code | s date | d time | t price | f volume | h mktflag | s comments| C
q)update `g#code,`s#date from `trades `trades
q)meta trades c | t f a --------| ----- code | s g date | d s time | t price | f volume | h mktflag | s comments| C
Exercise
Remove the attributes from the code and date columns of the trades table.
Solution
q)meta trades c | t f a --------| ----- code | s g date | d s time | t price | f volume | h mktflag | s comments| C
q)update `#code,`#date from `trades `trades
q)meta trades c | t f a --------| ----- code | s date | d time | t price | f volume | h mktflag | s comments| C
delete
delete also has a similar syntax to that of update.
We can use the following to delete all quotes for `GCM16:
delete from `quotes where code=`GCM16
Notice that we have again referred to the table by its name, `quotes, in order to change it in place.
Exercise
Delete all rows from quotes where the askprice is greater than 1000f.
Solution
q)delete from `quotes where askprice>1000f `quotes
Deleting columns
delete can be used to delete columns as well as rows:
delete bidprice from quotes
This will return a copy. To modify the table in place, use the symbol reference `quotes instead of quotes.
Exercise
Delete the columns mktflag and comments from the table trades.
Solution
q)delete mktflag,comments from `trades `trades
The order of evaluation
We could use 0N! to investigate the order of evaluation in q:
q)0N!1;0N!2;0N!3 1 2 3 3
q)(0N!1;0N!2;0N!3) 3 2 1 1 2 3
q)add:{x+y+z};add[0N!1;0N!2;0N!3] 3 2 1 6
q)select 0N!date,0N!time,0N!askprice from quotes 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 2016.04.07 .. 10:15:00.010 11:02:16.663 12:05:00.303 12:22:00.486 13:00:00.205 15:00:00.051.. 38.35 1241.5 38.13 38.12 1238.7 38.53 1241 40.84 40.56 40.79 40.77 date time askprice -------------------------------- 2016.04.07 10:15:00.010 38.35 2016.04.07 11:02:16.663 1241.5 2016.04.07 12:05:00.303 38.13 2016.04.07 12:22:00.486 38.12 2016.04.07 13:00:00.205 1238.7 2016.04.07 15:00:00.051 38.53 2016.04.07 15:20:02.224 1241 2016.04.08 10:53:00.002 40.84 2016.04.08 13:56:30.070 40.56 2016.04.08 15:20:02.000 40.79 2016.04.08 15:21:43.786 40.77
q)select 0N!date,0N!time,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7 00000001111b 1011b 2016.04.08 2016.04.08 2016.04.08 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77 date time askprice -------------------------------- 2016.04.08 10:53:00.002 40.84 2016.04.08 15:20:02.000 40.79 2016.04.08 15:21:43.786 40.77
From this it is clear that the order of evaluation is as follows:
date>2016.04.07
askprice>40.7
date
time
askprice
q)select 0N!time,0N!askprice by 0N!date from quotes where 0N!date>2016.04.07,0N!askprice>40.7 00000001111b 1011b 2016.04.08 2016.04.08 2016.04.08 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77 date | time askprice ----------| -------------------------------------------------------- 2016.04.08| 10:53:00.002 15:20:02.000 15:21:43.786 40.84 40.79 40.77
From this it is clear that the order of evaluation is as follows:
- date>2016.04.07
- askprice>40.7
- by date
- time
- askprice
q)select 0N!time by 0N!date,0N!askprice from quotes where 0N!date>2016.04.07,0N!askprice>40.7 00000001111b 1011b 2016.04.08 2016.04.08 2016.04.08 40.84 40.79 40.77 ,10:53:00.002 ,15:20:02.000 ,15:21:43.786 date askprice| time -------------------| ------------ 2016.04.08 40.77 | 15:21:43.786 2016.04.08 40.79 | 15:20:02.000 2016.04.08 40.84 | 10:53:00.002
From this it is clear that the order of evaluation is as follows:
- date>2016.04.07
- askprice>40.7
- by date
- by askprice
- time