Difference between revisions of "Programming/Kdb/Factorial"

From Thalesians Wiki
 
(2 intermediate revisions by the same user not shown)
Line 244: Line 244:
* returns a function as its result.
* returns a function as its result.


This is what Jeffry A. Borror says about iterators in <span class="plainlinks">[https://code.kx.com/q4m3/ Q for Mortals]</span>:
This is what Jeffry A. Borror says about iterators in [https://code.kx.com/q4m3/ Q for Mortals]:
<blockquote>
<blockquote>
If you are new to functional programming, you my think, "Big deal, I write <tt>for</tt> loops in my sleep." Granted. But the advantage of the higher-order function approach is that there is no chance of being off by one in the loop counter or accidentally running off the end of a data structure. More importantly, you can focus on ''what'' you want done without the irrelevant scaffolding of ''how'' to set up control structures. This is called '''declarative programming'''.
If you are new to functional programming, you my think, "Big deal, I write <tt>for</tt> loops in my sleep." Granted. But the advantage of the higher-order function approach is that there is no chance of being off by one in the loop counter or accidentally running off the end of a data structure. More importantly, you can focus on ''what'' you want done without the irrelevant scaffolding of ''how'' to set up control structures. This is called '''declarative programming'''.
</blockquote>
</blockquote>
==An implementation using <tt>prd</tt>==
Our latest implementation is more idiomatic than the previous ones.
However, there is still the question of '''code reuse'''&mdash;the use of existing software, or software knowledge, to build new software following the reusability principles.
There is something that we are not reusing, but could reuse, to our advantage.
<tt>prd x</tt> returns the product of the numeric list <tt>x</tt>. Moreover, unlike, for example, <tt>ij</tt>, which is implemented in k,
<pre>
q)ij
k){.Q.ft[{x[j],'(. y)i j:&(#y)>i:(!y)?(!+!y)#x}[;y]]x}
</pre>
<tt>prd</tt> is implemented in low-level C:
<pre>
q)prd
prd
</pre>
We confirm this&mdash;<tt>prd</tt> is in <tt>.Q.res</tt>:
<pre>
q).Q.res
`abs`acos`asin`atan`avg`bin`binr`cor`cos`cov`delete`dev`div`do`enlist`exec`exit`exp`getenv`hopen`if`in`insert`last`like`log`max`min`prd`select`setenv`sin`sqrt`ss`sum`tan`update`var`wavg`while`within`wsum`xexp
</pre>
Here is the resulting implementation:
<pre>
fact6:{prd 1+til x}
</pre>
=Assessing the performance of our implementations=
Let us list all of our implementations of the factorial:
<pre>
fact:{$[x;x*.z.s x-1;1]};
fact1:{if[x>0;:x*.z.s x-1];1};
fact2:{do[-1+f:r:x;r*:f-:1];r};
fact3:{r:1;do[x;r*:x;x-:1];r};
fact4:{r:1;while[x;r*:x;x-:1];r};
fact5:{(*/)1+til x};
fact6:{prd 1+til x}
</pre>
Which one is the most efficient?
Let's time them:
<pre>
q)\t:100000 fact first 1?26
1154                       
q)\t:100000 fact1 first 1?26
1314                       
q)\t:100000 fact2 first 1?26
934                       
q)\t:100000 fact3 first 1?26
963                       
q)\t:100000 fact4 first 1?26
1020                       
q)\t:100000 fact5 first 1?26
408                       
q)\t:100000 fact6 first 1?26
378                       
</pre>
Unsurprisingly, <tt>fact6</tt> emerges as a leader, followed closely by <tt>fact5</tt>. We can see that there is a performance reason for preferring iterators over loops, in addition to the previously stated observation that iterators are more idiomatic that loops. However, the fastest results are obtained by using an optimized function, <tt>prd</tt>, written in low-level C.
=Error handling=
Let us try breaking our implementation <tt>fact6</tt>:
<pre>
q)fact6 -1
'domain
  [1]  fact6:{prd 1+til x}
                    ^
q))\
q)fact6"foo"
'type
  [2]  (.q.til)
  [1]  fact6:{prd 1+til x}
                    ^
q))\
</pre>
We can recover from these errors using '''protected evaluation''', for example:
<pre>
q).[fact6;"foo";[0N!"recovering";0N]]
"recovering"
0N
</pre>
Both error messages are perfectly good by q standards. Suppose that we want to produce a more informative error message instead of <tt>'domain</tt>. That is possible too:
<pre>
fact7:{$[x<0;'"Argument must be nonnegative";prd 1+til x]}
</pre>
Then:
<pre>
q)fact7 5
120
q)fact7 0
1
q)fact7 -1
'Argument must be nonnegative
  [0]  fact7 -1
      ^
</pre>

Latest revision as of 22:12, 21 December 2021

The factorial

The factorial of a non-negative integer , denoted by , is the product of all positive integers less than or equal to :

For example,

The value of is 1, according to the convention for an empty product.

The factorial operation is encountered in many areas of mathematics, notably in combinatorics, algebra, and mathematical analysis. Its most basic use counts the number of distinct sequences—the permutations—of distinct objects: there are .

Implementing the factorial in q

A dummy implementation

We'll implement the factorial as a q function. That function will take a single parameter, x, and return the result.

We'll start with a dummy, incorrect implementation that always returns 1:

fact:{[x]1}

This defines a function, {...}, which takes a single argument, [x], which always (irrespective of the argument) returns 1. We assign, :, a name to that function, fact, so we can call it again and again, as required.

This is a function of a single argument, in other words, a monadic or unary function. We could have a dyadic (binary), triadic (ternary), etc. function. Here is an example of a dyadic function:

add:{[x;y]x+y}

Incidentally, it is possible to have a niladic (nullary) function. Such functions are not necessarily constant and may have useful side effects. The following function will print out the current time:

tm:{[].z.p}

When q encounters the variables named x, y, and z in the body of a function, it knows that these are the function's arguments. So we could skip [x;y] above:

add:{x+y}

We wouldn't be able to do this if we named the arguments differently:

add:{[a;b]a+b}

A recursive implementation using the conditional evaluation

The most basic implementation of a factorial is the recursive one.

In general, recursion is a method of solving a problem where the solution depends on the solutions to smaller instances of the same problem. Such problems can generally be solved by iteration, but this needs to identify and index the smaller instances at programming time. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

—Niklaus Wirth, Algorithms + Data Structures = Programs, 1976.

Most computer programming languages support recursion by allowing a function to call itself from within its own code. Q is no exception:

fact:{$[x=0;1;x*fact x-1]}

We have just used the conditional evaluation, the ternary overload of $:

$[expr_cond; expr_true; expr_false]

Here expr_cond is an expression that evaluates to a boolean atom. The result of expr_cond can be any type whose underlying value is an integer. The result of the conditional is the evaluation of expr_true when expr_cond is not zero and expr_false if it is zero.

Notice that we don't need to have a boolean zero in expr_cond for expr_false to be evaluated; it could well be a long zero. Thus we can rewrite fact more tersely:

fact:{$[x;x*fact x-1;1]}

How would the function call itself if it weren't assigned the name fact? In q, there is a provision for that. The function can call itself using .z.s:

fact:{$[x;x*.z.s x-1;1]}

This works even if the function is anonymous, for example:

q){$[x;x*.z.s x-1;1]}[5]
120

Let's check the first few values:

q)fact[0]
1
q)fact[1]
1
q)fact[2]
2
q)fact[3]
6
q)fact[4]
24
q)fact[5]
120

Let us look at our recursive function again:

fact:{$[x;x*.z.s x-1;1]}

A recursive function definition has one or more base cases, meaning input(s) for which the function produces a result trivially (without recurring), and one or more recursive cases, meaning input(s) for which the program recurs (calls itself).

In our case the base case is . You can see it here:

fact:{$[x;...;1]}

Our recursive case is, for all , . You can see it here:

fact:{$[...;x*.z.s x-1;...]}

Because the base case breaks the chain of recursion, it is sometimes also called the "terminating case".

A recursive implementation using the imperative if statement

Instead of the conditional evaluation we could have used the imperative if statement, which conditionally evaluates a sequence of expressions:

if[expr_cond;expr_1;...;expr_n]

The expr_cond is evaluated and, if it is nonzero, the expressions expr_1, ..., expr_n are evaluated in left-to-right order. As with other conditionals, the brackets do not create a lexical scope, so variables defined in the body exist in the same scope as the if.

Unlike $[...], there is no "else" to go with if.

Also unlike $[...], if[...] is not a function and does not return a value. Therefore we need to force the return using ::

fact1:{if[x>0;:x*.z.s x-1];1}

A problem with the recursive implementations

Recursive implementations tend to underperform their iterative peers.

In addition, they may run out of stack space:

q)fact 2001
'stack
  [1799fact:{$[x;x*.z.s x-1;1f]}
                   ^

This doesn't matter that much specifically for the factorial, because the factorial of 26 is already enormous and overflows the long:

q)fact 25
7034535277573963776
q)fact 26
-1569523520172457984

For many other recursive functions 'stack is a serious issue.

An iterative implementation using the imperative do statement

The imperative do statement allows repeated execution of a block of statements. It has the form,

do[expr_count;expr_1;...;expr_n]

where expr_count must evaluate to a nonnegative integer. The expressions expr_1, ..., expr_n are evaluated expr_count times in left-to-right order.

Note that do is a statement, not a function, and does not have an explicit result.

Here is a (rather awkward) implementation of a factorial using do:

fact2:{do[-1+f:r:x;r*:f-:1];r}

Here is another:

fact3:{r:1;do[x;r*:x;x-:1];r}

An iterative implementation using the imperative while statement

The imperative while statement is an iterator of the form

while[expr_cond;expr_1;...;expr_n]

where expr_cond is evaluated and the expressions expr_1, ..., expr_n are evaluated repeatedly in left-to-right order as long as expr_cond is nonzero.

The while statement is not a function, does not have an explicit result, and does not introduce a lexical scope.

Here is an iterative implementation of a factorial using while:

fact4:{r:1;while[x;r*:x;x-:1];r}

The following implementation is seemingly superior, as it is terser, but it fails at x = 0:

fact4a:{f:r:x;while[f-:1;r*:f];r}

An iterative implementation using the Over iterator

Iterators, which used to be called adverbs, modify the meaning of verbs.

For example, the verb Join (,) ordinarily concatenates lists. The result of

("foo";"bar";"baz"),("qux";"quux";"quuz")

is the list

("foo";"bar";"baz";"qux";"quux";"quuz")

However, if we modify this verb with Each ('), we get a pairwise application of Join:

("foo";"bar";"baz"),'("qux";"quux";"quuz")

produces

("fooqux";"barquux";"bazquuz")

Each is what is known as a map iterator. There are other map iterators, and there are accumulator iterators, such as Over (/):

(+/)1+til 5

means

(((1+2)+3)+4)+5

which equals 15.

If we replace the verb + with the verb * and likewise modify it with the iterator (adverb) /, we obtain the foundation of our next implementation of the factorial:

(*/)1+til 5

means

(((1*2)*3)*4)*5

which equals 120.

Hence here is our next implementation of the factorial—using the Over iterator:

fact5:{(*/)1+til x}

Iterators are examples of higher-order functions of functional programming. In functional programming, a higher-order function is a function that does at least one of the following:

  • takes one or more functions as arguments,
  • returns a function as its result.

This is what Jeffry A. Borror says about iterators in Q for Mortals:

If you are new to functional programming, you my think, "Big deal, I write for loops in my sleep." Granted. But the advantage of the higher-order function approach is that there is no chance of being off by one in the loop counter or accidentally running off the end of a data structure. More importantly, you can focus on what you want done without the irrelevant scaffolding of how to set up control structures. This is called declarative programming.

An implementation using prd

Our latest implementation is more idiomatic than the previous ones.

However, there is still the question of code reuse—the use of existing software, or software knowledge, to build new software following the reusability principles.

There is something that we are not reusing, but could reuse, to our advantage.

prd x returns the product of the numeric list x. Moreover, unlike, for example, ij, which is implemented in k,

q)ij
k){.Q.ft[{x[j],'(. y)i j:&(#y)>i:(!y)?(!+!y)#x}[;y]]x}

prd is implemented in low-level C:

q)prd
prd

We confirm this—prd is in .Q.res:

q).Q.res
`abs`acos`asin`atan`avg`bin`binr`cor`cos`cov`delete`dev`div`do`enlist`exec`exit`exp`getenv`hopen`if`in`insert`last`like`log`max`min`prd`select`setenv`sin`sqrt`ss`sum`tan`update`var`wavg`while`within`wsum`xexp

Here is the resulting implementation:

fact6:{prd 1+til x}

Assessing the performance of our implementations

Let us list all of our implementations of the factorial:

fact:{$[x;x*.z.s x-1;1]};
fact1:{if[x>0;:x*.z.s x-1];1};
fact2:{do[-1+f:r:x;r*:f-:1];r};
fact3:{r:1;do[x;r*:x;x-:1];r};
fact4:{r:1;while[x;r*:x;x-:1];r};
fact5:{(*/)1+til x};
fact6:{prd 1+til x}

Which one is the most efficient?

Let's time them:

q)\t:100000 fact first 1?26 
1154                        
q)\t:100000 fact1 first 1?26
1314                        
q)\t:100000 fact2 first 1?26
934                         
q)\t:100000 fact3 first 1?26
963                         
q)\t:100000 fact4 first 1?26
1020                        
q)\t:100000 fact5 first 1?26
408                         
q)\t:100000 fact6 first 1?26
378                         

Unsurprisingly, fact6 emerges as a leader, followed closely by fact5. We can see that there is a performance reason for preferring iterators over loops, in addition to the previously stated observation that iterators are more idiomatic that loops. However, the fastest results are obtained by using an optimized function, prd, written in low-level C.

Error handling

Let us try breaking our implementation fact6:

q)fact6 -1
'domain
  [1]  fact6:{prd 1+til x}
                    ^
q))\
q)fact6"foo"
'type
  [2]  (.q.til)

  [1]  fact6:{prd 1+til x}
                    ^
q))\

We can recover from these errors using protected evaluation, for example:

q).[fact6;"foo";[0N!"recovering";0N]]
"recovering"
0N

Both error messages are perfectly good by q standards. Suppose that we want to produce a more informative error message instead of 'domain. That is possible too:

fact7:{$[x<0;'"Argument must be nonnegative";prd 1+til x]}

Then:

q)fact7 5
120
q)fact7 0
1
q)fact7 -1
'Argument must be nonnegative
  [0]  fact7 -1
       ^