Difference between revisions of "Programming/Kdb/Labs/Big data in kdb"

From Thalesians Wiki
< Programming‎ | Kdb‎ | Labs
(Created page with "To be written.")
 
Line 1: Line 1:
To be written.
=Example table=
 
Let us create and populate the <tt>trades</tt> table:
<pre>
trades : ([] code:`symbol$(); date:`date$(); time:`time$(); price:`float$(); volume:`short$(); mktflag:`symbol$(); comments:());
.[`trades; (); ,; (
    (`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
    (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; "");
    (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; "");
    (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; "");
    (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; "");
    (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; "");
    (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
    (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class");
    (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
    (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade");
    (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; "");
    (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; "");
    (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
    (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]
</pre>
 
Each row represents a trade in a futures contract. The value in the <tt>code</tt> column identifies that contract. The <tt>date</tt> and <tt>time</tt> columns give the dates and times of the trades as reported by the exchange. The <tt>price</tt> column contains the trade prices per contract. The <tt>volume</tt> column contains the number of contracts traded. The name of the <tt>mktflag</tt> column stands for "market flag", which distinguishes the pit trades (in which case the value in the column is <tt>`P</tt>) from the electronic trades (in which case the value is <tt>`E</tt>). In our table, which represents a hypothetical subsample from some large data set, all the trades are electronic trades. The <tt>comments</tt> column contains arbitrary comments on the data rows as strings &mdash; not as symbols. In practice, the specific comments in our example ("NYMEX/CME Group trade", "Transaction represents a trade in two contract months in the same class") would be implemented as symbol or numeric flags rather than as strings. We chose to use the string data type to demonstrate its use in a table.
 
We can use the following command to save this table as a binary file:
<pre>
`:tradesfile set trades
</pre>
The resulting file will have 965 bytes and will reside in the current directory, which you can check with
<pre>
\cd
</pre>
By default this will be the directory containing the <tt>q.exe</tt> executable. To save it in a different directory, we could use something like this:
<pre>
(`$":C:/Users/Paul Bilokon/Documents/tradesfile") set trades
</pre>
The brackets are needed as
<pre>
`$":C:/Users/Paul Bilokon/Documents/tradesfile" set trades
</pre>
is interpreted by q, from right to left, as
<pre>
`$(":C:/Users/Paul Bilokon/Documents/tradesfile" set trades)
</pre>
which isn't what we want.
 
In the case of success the return is the file path as a symbol:
<pre>
q)(`$":C:/Users/Paul Bilokon/Documents/tradesfile") set trades
`$":C:/Users/Paul Bilokon/Documents/tradesfile"
</pre>

Revision as of 12:39, 20 June 2021

Example table

Let us create and populate the trades table:

trades : ([] code:`symbol$(); date:`date$(); time:`time$(); price:`float$(); volume:`short$(); mktflag:`symbol$(); comments:());
.[`trades; (); ,; (
    (`CLM16; 2016.04.07; 10:20:00.329; 38.3  ; 4h; `E; "NYMEX/CME Group trade");
    (`GCM16; 2016.04.07; 12:00:00.055; 1239.7; 6h; `E; "");
    (`GCM16; 2016.04.07; 12:37:02.106; 1240.5; 1h; `E; "");
    (`CLM16; 2016.04.07; 13:00:00.128; 38.04 ; 3h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.07; 13:22:05.617; 18.85 ; 5h; `E; "");
    (`GCM16; 2016.04.07; 14:35:01.241; 1241.2; 1h; `E; "");
    (`GCM16; 2016.04.08; 10:13:01.048; 1240f ; 3h; `E; "");
    (`VXK16; 2016.04.08; 11:34:53.417; 18.53 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
    (`CLM16; 2016.04.08; 12:00:00.227; 40.61 ; 3h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.08; 12:44:00.684; 18.44 ; 2h; `E; "Transaction represents a trade in two contract months in the same class");
    (`VXK16; 2016.04.08; 12:45:33.130; 18.49 ; 1h; `E; "Transaction represents a trade in two contract months in the same class");
    (`CLM16; 2016.04.08; 15:20:02.000; 40.78 ; 3h; `E; "NYMEX/CME Group trade");
    (`CLM16; 2016.04.11; 11:00:00.105; 41.43 ; 2h; `E; "NYMEX/CME Group trade");
    (`VXK16; 2016.04.11; 14:00:00.829; 18.35 ; 1h; `E; "");
    (`VXK16; 2016.04.11; 15:14:58.775; 19.05 ; 2h; `E; "");
    (`GCM16; 2016.04.11; 16:00:00.044; 1257.9; 1h; `E; "");
    (`GCM16; 2016.04.11; 16:28:34.311; 1258.7; 1h; `E; ""))]

Each row represents a trade in a futures contract. The value in the code column identifies that contract. The date and time columns give the dates and times of the trades as reported by the exchange. The price column contains the trade prices per contract. The volume column contains the number of contracts traded. The name of the mktflag column stands for "market flag", which distinguishes the pit trades (in which case the value in the column is `P) from the electronic trades (in which case the value is `E). In our table, which represents a hypothetical subsample from some large data set, all the trades are electronic trades. The comments column contains arbitrary comments on the data rows as strings — not as symbols. In practice, the specific comments in our example ("NYMEX/CME Group trade", "Transaction represents a trade in two contract months in the same class") would be implemented as symbol or numeric flags rather than as strings. We chose to use the string data type to demonstrate its use in a table.

We can use the following command to save this table as a binary file:

`:tradesfile set trades

The resulting file will have 965 bytes and will reside in the current directory, which you can check with

\cd

By default this will be the directory containing the q.exe executable. To save it in a different directory, we could use something like this:

(`$":C:/Users/Paul Bilokon/Documents/tradesfile") set trades

The brackets are needed as

`$":C:/Users/Paul Bilokon/Documents/tradesfile" set trades

is interpreted by q, from right to left, as

`$(":C:/Users/Paul Bilokon/Documents/tradesfile" set trades)

which isn't what we want.

In the case of success the return is the file path as a symbol:

q)(`$":C:/Users/Paul Bilokon/Documents/tradesfile") set trades
`$":C:/Users/Paul Bilokon/Documents/tradesfile"