This content originally appeared on DEV Community and was authored by Paula Gearon
I was reading Tory Anderson's blog post on writing a fast Fibonacci calculator in Clojure. He uses a logarithmic algorithm to find the nth
Fibonacci number, which was interesting, and something from SICP that I'd forgotten about.
Tory's code took about 56 minutes, though on my local machine here it was more like 10 minutes. That made me curious to think about what slows it down.
Tory went on to consider some optimizations, and looked at using memoize
to cache function results. This is a simple improvement when using a naïve implementation, as per the tree recursion section in SICP, since it avoids recalculating the same values multiple times. However, the logarithmic algorithm only calculates each value once. We can see this if we add (println n)
to the top of Tory's fib*
function:
=> (fib 100)
100
50
25
12
6
3
1
0
354224848179261915075
This only executed the fib*
function 8 times, with different arguments each time. Consequently, caching results won't change the performance at all.
Similarly, executing (fib 1000000000)
will only execute fib*
31 times, which tells us that the recursive calls to fib*
are not what takes time. Looking inside the function, we can see that all of the other operations are math, with no loops at all:
(let [[a b] (fib* (quot n 2))
c (*' a (-' (*' 2 b) a))
d (+' (*' b b) (*' a a))]
(if (even? n)
[c d]
[d (+' c d)]))
This is all just division (via quot
), multiplication, subtraction, and addition. These are typically fast operations, so why is it slow?
The answer is based in the size of the numbers being calculated. As we saw above, the 100th Fibonacci number is 21 digits long. This grows rapidly, and by the billionth number the size is hundreds of millions of digits long! 208,987,640 digits, in fact. This can't be calculated by standard number types, and needs a type like Java's BigInteger
type, or Clojure's wrapper, BigInt
.
The math operations on these numbers are taking a long time, so is there anything faster? Unless we want to build out own math library, then we should look for something that already exists. This made me think of dc
, which is an arbitrary precision calculator that comes with most Unix-based systems. I seem to recall that this application was supposed to be fast. How would it compare?
Shells
The easiest way to try this would be to shell out of the JVM to execute the dc
program, then communicate via a pipe. This is typically slow, but if the fib*
function only gets called 31 times, and there are only a few functions in that function, then this time will be negligible when compared to the time for the math operations.
My first thought was, "Well, I know how to do that, but it will be fiddly. I'm sure it will take too long." And that was going to be that.
But after a minute, I started thinking, "I haven't tried to call anything through a shell for a long time. Maybe I can just try something simple?"
First of all, we want to see this operating on the command line. It uses reverse polish notation, which is a bit obtuse, but it does usually make for easier programming. The operations for dc
can be provided in a script to execute via the -e
operation, and the operations in the script here are:
- Provide 2 numbers to place on the stack
- Provide the operation
+
. This removes the top 2 numbers from the stack, adds them, and places the result on the top of the stack. - The
p
operation, which prints the value on the top of the stack
$ dc -e '2 3 +p'
5
Now, we just need to execute it from Clojure. We do this with the shell
namespace, and each element of the command line is provided as a separate string parameter:
(require '[clojure.java.shell :as shell])
(shell/sh "dc" "-e" "2 3 +p")
The result of this is a structure containing stdout
, stderr
(as strings), and the exit code:
{:exit 0, :out "5\n", :err ""}
Everything is done with strings, and the final character needs to be truncated, but this is all tractable. So an addition function for BigInt
is reasonably short:
(defn +n
[a b]
(let [{out :out} (shell/sh "dc" "-e" (str a " " b " +p"))]
(bigint (subs out 0 (dec (count out))))))
I named it with n
on the end, since Clojure BigInt
values are printed with a trailing N
.
Trying it out:
=> (+n 2 3)
5N
But why bother wrapping in a BigInt
? All the math operations are go over the pipe, so the numbers have to be converted to strings anyway. It makes sense to just leave them as strings.
Using the above, we can generalize it to all of the required operations:
(defn op
[o]
(fn [a b]
(let [{out :out} (sh/sh "dc" "-e" (str a " " b " " o "p"))]
(subs out 0 (dec (count out))))))
(def +n (op "+"))
(def *n (op "*"))
(def -n (op "-"))
(def quotn (op "~r"))
We can also add in the string tests for even?
and zero?
:
(defn ev? [s] (even? (int (last s))))
(defn z? [n] (= "0" n))
(Note that the test for ev?
is checking the ASCII value of the final digit, which is even for all even digits).
We can use these in Tory's function by tweaking it slightly:
(defn fibn* [n]
(if (z? n)
["0" "1"]
(let [[a b] (fibn* (quotn n "2"))
c (*n a (-n (*n "2" b) a))
d (+n (*n b b) (*n a a))]
(if (ev? n)
[c d]
[d (+n c d)]))))
(defn fibn [n] (first (fibn* (str n))))
Since there are so many math operations in a row, then this could all be turned into a single dc
script, but that is really taking us into dc
programming, which isn't the point here. Instead, we're just trying to leverage the simple math operations on large numbers.
Testing it looks promising:
=> (map fibn (range 10))
("0" "1" "1" "2" "3" "5" "8" "13" "21" "34")
It takes a second, which I'd expect with so many calls through the shell, but it works.
Output problem
In fact, it works, all the way up to 332, when we get:
108245926205643306387794020096663813380901526766531123754208267893890\
9
It turns out that dc only prints out numbers to a width of 70 characters. Any numbers wider than that will broken up into multiple lines, with each line ending with a \
character.
We can flatten this back into a single line by using a string replacement:
(require '[clojure.java.shell :as shell])
(require '[clojure.string :as str])
(defn op
[o]
(fn [a b]
(let [{out :out} (shell/sh "dc" "-e" (str a " " b " " o "p"))
flat (str/replace out "\\\n" "")]
(subs flat 0 (dec (count flat))))))
Comparison
Looking at the billionth Fibonacci takes a long time, so let's just compare the millionth number:
=> (def result1 (time (fibn 1000000)))
"Elapsed time: 773389.633083 msecs"
=> (def result2 (time (fib 1000000)))
"Elapsed time: 172.583417 msecs"
=> (= result2 (bigint result1))
true
At 773s compared to 172ms, the dc
backed version is nearly 4500 times slower 🤣
There are 8 calls to math operations per invocation of fibn*
, and 1,000,000 executes fibn*
20 times, for a total of 160 calls to the shell. Shelling out like this, along with the string processing involved in moving data across the pipe does take some time. Overall, it is taking a few seconds, but this is inconsequential when seeing that the majority of the 773 seconds are being taken up by dc
doing the work.
Calling out a few of the later operations and running them directly in dc
demonstrates that it just takes a long time when working with large numbers.
BigInteger
Clojure's clojure.lang.BigInt
class wraps the Java java.math.BigInteger
class. The main reason for this appears to be that small values will keep an equivalent long
value cached, for fast operations on small values.
I've been looking at Java's BigInteger
class recently, considering how to implement the same functionality for ClojureScript, so I have some sense of what it's trying to do.
I once saw an arbitrary precision math library manipulate numbers as strings of decimal digits. This worked well intuitively, since it means that operations like addition and multiplication can be performed using standard pencil-and-paper algorithms that are taught in school. But the principal benefit is that numbers can be parsed from a string or written back out trivially. However, decimal digits are stored in 8 bits, but only represent 3.25 bits of information. That's only about 40% storage efficiency.
By contrast, Java's BigInteger
stores large numbers in binary. This stores 8 bits of information per byte, which is far denser. The same pencil-and-paper algorithms also work on numbers stored this way, but use symbols that range from 0-255, rather than 0-9. This is significantly faster to process, which is a major advantage.
However, parsing from and conversion back to a string are very expensive operations, as these are conversions between binary and decimal. For instance, executing (fib billion)
on my machine takes 10.4 minutes to get the answer, but converting that answer into a string takes 25.6 minutes. That's nearly 2.5 times longer.
I suspect that parsing and printing are where dc
is spending all of its time, especially since this is happening 20 times for each invocation of the fib*
function. I should learn to write more complex dc
operations, so I can run the entire algorithm in it.
Takeaways
What was the outcome of all this? Well, I lost an evening of my life. 😊
More seriously though, it helped me solidify my usage of calling out to a shell from Clojure.
Not mentioned above, I also tried using the spit
function to emit the operations to files which I then passed to dc
using the -f
flag. This was useful for seeing what specific operations had been executed. It also helped me gauge exactly where in the operation the function had reached. This was what first alerted me to the fact that dc
was taking a long time to execute basic operations. As mentioned, I expect that this was due to the parsing/printing phases of each invocation.
I also learned a bit more about performance when it comes to parsing and printing large numbers. Implementing this in JavaScript has bogged me down, due to the sheer complexity of it, and it's interesting to see that the computer is having as hard a time doing it as I am having in trying to write it.
I also had the opportunity to pull apart the logarithmic Fibonacci algorithm. I had skimmed this in my last reading of SICP, so I'm glad that I've taken the time to work through it now.
This content originally appeared on DEV Community and was authored by Paula Gearon
Paula Gearon | Sciencx (2022-07-12T20:02:41+00:00) How fast is dc?. Retrieved from https://www.scien.cx/2022/07/12/how-fast-is-dc/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.