This content originally appeared on DEV Community and was authored by Paula Gearon
This is the second part to my post on immutability and Clojure.
Missing State
One advantage to having mutability is the ability to accumulate state. For instance, let's look at some data to process in JavaScript:
const data = [
{label: "twilight", values: [10.0, 15.3]},
{label: "apple", values: [12.0, 11.5, 18.1]},
{label: "dash", values: [18.0]},
{label: "pie", values: [13.0, 13.2, 44.8, 15.5]}
]
Consider how we might get the average of all of those values. This can be done in multiple ways, but a straightforward approach is to accumulate the sum of all the values, while also accumulating the count.
let total = 0
let count = 0
data.forEach(function(d) {
d.values.forEach(v => total += v)
count += d.values.length
})
console.log("Average: " + (total / count))
Average: 17.14
These accumulating values represent state, which the principles of immutability are trying to avoid. How might the same task be done without accumulating values like this?
Immutable in JavaScript
Rather than jumping straight to Clojure, let's look at this in JavaScript.
We need a total, and we need a total count. One way to get these is with the reduce
operation to process an array into a single value. Let's start by creating a function that can add all the numbers in an array:
function sum(array) {
return array.reduce((accumulator, element) => accumulator += element, 0)
}
The accumulator
here gets initialized with the 0
parameter, and is passed in as an argument for the arrow function at each step.
We can test this on a short array:
console.log(sum([1, 2, 3, 4, 5]))
15
This can be used to sum the numbers across all the arrays:
const total = data.reduce((acc, element) => acc += sum(element.values), 0)
We can add the counts similarly:
const count = data.reduce((acc, element) => acc += element.values.length, 0)
This gives us the same answer:
console.log("Average: " + (total / count))
Average: 17.14
However, the data had to be processed twice, once for each calculation. This could have serious ramifications if the data were particularly large, or if some other processing operation required time-consuming I/O.
Immutable in Clojure
Before addressing this, let's look at the same operations and data in Clojure:
(def data [
{:label "twilight", :values [10.0, 15.3]}
{:label "apple", :values [12.0, 11.5, 18.1]}
{:label "dash", :values [18.0]}
{:label "pie", :values [13.0, 13.2, 44.8, 15.5]}])
The sum
function can be done a couple of ways, but if we stick to reduce
:
(defn sum [array] (reduce + array))
Now we can do the total and count. Clojure has some options that could make this easier, but transliterating the JavaScript gives:
(let [total (reduce (fn [acc element] (+ acc (sum (:values element)))) 0 data)
total-count (reduce (fn [acc element] (+ acc (count (:values element)))) 0 data)]
(println "Average:" (/ total total-count)))
Average: 17.14
Regaining State
As mentioned above, each call to reduce
is processing the data sequence again, which leads to redundant work. Instead, we want to be accumulating both values while processing the data. This can be done by accepting and returning both values at each step.
(let [[total total-count]
(reduce (fn [[previous-total previous-count] element]
[(+ previous-total (sum (:values element)))
(+ previous-count (count (:values element)))])
[0 0] data)]
(println "Average:" (/ total total-count)))
There are a few things to notice here:
- The function passed to
reduce
still takes 2 arguments. - The first argument to the function is "unpacked" into a pair of values.
- The value returned from the function is a 2 element vector.
- The "initial value" provided to
reduce
is a 2 element vector containing the 2 initial values (both zero). - The final result of the
reduce
is a 2 element vector, which gets unpacked into thetotal
andtotal-count
.
This approach to returning multiple values is something that JavaScript also supports, and is similar to using tuples to return multiple values in Python. Other languages, like Java, struggle with this, resorting to cumbersome constructs like:
return new double[] { newTotal, newCount };
And without an unpacking syntax, this becomes:
var result = callFunction();
double total = result[0];
double count = result[1];
It works, but it's also one of the many reasons why I stopped coding Java.
Regardless of the language, the state of the process is being accumulated in multiple values. The entire state is passed in as a parameter, along with the new data to process, and the new state is returned.
A More Structured Approach
Vectors and tuples are not the only data structure that can be used to return multiple items that represent state. Clojure's map syntax is also useful, particularly as it allows each part of the state to be named. Since the accumulated state is being destructured, we can also destructure the element being processed:
(let [{:keys [total total-count]}
(reduce (fn [{:keys [total total-count]} {values :values}]
{:total (+ previous-total (sum values))
:total-count (+ previous-count (count values))})
{:total 0 :total-count 0} data)]
(println "Average:" (/ total total-count)))
In a simple case like this, a vector makes more sense to accumulate the state, but as the state becomes larger and more complex, then using a map to encapsulate the state begins to gain value.
When Mutation Makes Sense
While mutation leads to greater complexity of state, there are occasions when mutation can be useful. The most common structures for this are Atom
s and Volatile
s.
Counter
A counter could theoretically be kept in an immutable state map, but passing this around everywhere would be both cumbersome and would slow down the program. It is also possible to accidentally pass in an old state, thereby returning a duplicate number. A global mutating counter would be ideal for this task.
The Clojure Runtime has an internal nextID
function to return a number that is always greater than previously returned values. The function is used internally by the gensym
function. This could instead be implemented in Clojure using an Atom:
(def ^:private last-id (atom 0))
(defn next-id [] (swap! last-id inc))
While Clojure actually implements this in Java, ClojureScript does it in ClojureScript with an atom.
The code above initializes the atom to 0, then every time the next-id
function is called the number in the atom gets incremented and returned.
Atoms have very important properties that make them thread safe, meaning that multiple threads could be accessing them at the same time without problems like duplicate values, or bad data.
Caching
Similarly, a mutating map containing a cache can be easier to manage than passing the cache in and out of every function that needs it.
Clojure's memoize
function attaches an atom to a function to return previously calculated values. This can be valuable for expensive operations.
For instance, a function that naively calculates the nth
Fibonacci number has exponential complexity:
(defn fib [n] (if (< n 2) n (+ (fib (dec n)) (fib (- n 2)))))
This will work fine up to 30 or so, but after that you will start seeing delays. On my notebook, calling (fib 45)
will take over 9 seconds. If I run it with an argument of 50 then it takes over 10 minutes. Take a look at the "exponential complexity" link above to see why this happens.
But if the function is memoized, then large numbers can be evaluated quickly without having to change the algorithm. We can do this by changing from defn
to using an equivalent def
, and calling memoize
on the function definition.
(def fib (memoize (fn [n] (if (< n 2) n (+ (fib (dec n)) (fib (- n 2)))))))
The code for memoize is relatively short:
(defn memoize
[f]
(let [mem (atom {})]
(fn [& args]
(if-let [e (find @mem args)]
(val e)
(let [ret (apply f args)]
(swap! mem assoc args ret)
ret)))))
It returns a new function within the scope of an atom called mem
. This atom contains a map of function arguments as a sequence to previous values. If an entry for a seq of arguments doesn't exist, then the original function is run on the arguments, and then that result is stored in the atom before being returned.
Understanding how to attach an atom to a function like this is very useful if you want a more intelligent cache, such as with core.cache
. This library provides immutable objects, much like an immutable map, which can be held in an atom in a similar way to how memoize
does it.
core.cache Example
This is stepping up a level in complexity, but since I'm here, this is a function that performs a similar job as memoize
, but with an LRU cache from the core.cache library.
(require '[clojure.core.cache :as cache])
(defn lru-memoize
[f]
(let [mem (atom (cache/lru-cache-factory {}))]
(fn [& args]
(let [e (cache/lookup @mem args ::nil)]
(if (= ::nil e)
(let [ret (apply f args)]
(swap! mem cache/miss args ret)
ret)
(do
(swap! mem cache/hit args)
e))))))
By default, this holds up to 32 items in the cache, though it can be customized with an extra parameter to the lru-cache-factory
function.
Local Mutation
The above examples show mutation at a global scale (or, potentially global, if a memoized function is to be accessible from anywhere). Because the scope of access is open, then the mutating objects must be thread-safe. But there are occasions where mutation might occur within a controlled block of code. In such cases, the small overhead associated with thread safety may be avoided by using a Volatile object.
Volatiles are implemented as a simple object in the host system (Java, JavaScript, C#, Dart, etc) with a single mutable field. To see how simple the implementation is, the Java version is:
final public class Volatile implements IDeref {
volatile Object val;
public Volatile(Object val) {
this.val = val;
}
public Object deref() {
return val;
}
public Object reset(Object newval) {
return this.val = newval;
}
}
There's not much to it: You can create it, read the value in it, and change the value in it. This is the same as an Atom, but is not thread safe.
Since volatiles are not thread safe, it is important that they are only used in a context that cannot be accessed from outside the current thread. A typical example is within the confines of a function or let
block.
One example is to use a volatile to record information while processing data for another purpose. Let's start by creating a sequence of 1000 random numbers, from 0 to 100:
(def data (repeatedly 1000 #(rand-int 100)))
Let's determine the sum of these numbers, but on the way, count how many of them are divisible by 7:
(let [sevens (volatile! 0)
total (reduce (fn [acc n]
(when (zero? (mod n 7))
(vswap! sevens inc))
(+ acc n))
0 data)]
(println "total:" total "sevens:" @sevens))
Can this be done with extra state in the accumulator? Certainly. But as complexity of the code increases, then having state on the side like this can help to simplify. It can also be faster than creating temporary objects that contain the updated state.
Transducers
One of the most important reasons for using volatile values is for maintaining state throughout a process. The most common example of this is in Transducers. These are a group of functions that are used for processing a sequence, with one function handling the start, another handling the end, and a function for handling each element in the sequence.
If there is a need for a sequence processor to remember what it has already done, then the transducer can use a volatile to do it. One example of this is the dedupe
function. It looks like this:
(defn dedupe
([]
(fn [rf]
(let [pv (volatile! ::none)]
(fn
([] (rf))
([result] (rf result))
([result input]
(let [prior @pv]
(vreset! pv input)
(if (= prior input)
result
(rf result input))))))))
([coll] (sequence (dedupe) coll)))
Most of this is boilerplate for a transducer, but notice the let
block that declares the pv
var as a volatile.
The 2-argument form of the transducer does the following:
- Looks up the prior value, which is stored in the volatile.
- Update the prior value to the current value.
- If it equals the current value, then return the result unchanged.
- The current value is different to the prior value, so add it to the result.
Other built-in transducers use volatiles to store state, as well. take
and drop
use a volatile to count how many items they have iterated over. map-indexed
uses a volatile to remember the current index value. distinct
uses a volatile to remember everything that it has already returned so that it never returns the same value twice. Be sure to reference this code if you ever need to
create a transducer that can remember state between elements.
Wrap Up
This was a much more advanced look at immutability and mutability in Clojure.
We started by looking at how state can be remembered by accepting and returning state at each stage of processing. State like that can be stored in any kind of structure, and is typically a vector when the state is small, or a map when the complexity of the state increases. State like this is often destructured by the code that processes it.
We then started looking at cases where mutability is useful, and some of the tools that Clojure provides for this. When data is globally accessible, or can be accessed in multiple threads, then an Atom is usually ideal for mutable data. But when the scope of this data is constrained, the a Volatile can be used instead. Volatiles work just like Atoms, but are not thread safe.
Afterword
This is much more complicated than my original introduction of immutability. I felt that it was necessary to describe the more complex cases that inevitably arise in a project, and also the flexibility that Clojure offers when mutability really is the best tool for the job.
Again, I would appreciate any questions or complaints about this, to help improve the explanation.
This content originally appeared on DEV Community and was authored by Paula Gearon
Paula Gearon | Sciencx (2022-06-19T02:47:44+00:00) More Immutability and Clojure. Retrieved from https://www.scien.cx/2022/06/19/more-immutability-and-clojure/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.