Posted on 2016-11-05 Edit on GitHub
Surprises with side effects
Let's map a side-effecting function like
prn over a list3:
(take 1 (map prn '(1 2 3 4 5)))
You may expect that, since there is a
(take 1 ...), only one number will be printed. This is indeed the case:
But try doing the same to a vector:
(take 1 (map prn [1 2 3 4 5]))
1 2 3 4 5 (nil)
What's going on?
In the definition of
seq is called on the collection.
seq causes a lazy sequence to realize its value, i.e. the
(map prn ...) is evaluated.
Now we need to take a look at the definition of
map; it is sufficient to look at one arity:
([f coll] (lazy-seq (when-let [s (seq coll)] (if (chunked-seq? s) (let [c (chunk-first s) size (int (count c)) b (chunk-buffer size)] (dotimes [i size] (chunk-append b (f (.nth c i)))) (chunk-cons (chunk b) (map f (chunk-rest s)))) (cons (f (first s)) (map f (rest s)))))))
The difference is that
[1 2 3 4 5] goes down the
chunked-seq? path, while
(1 2 3 4 5) does not.
(seq [1 2 3 4 5])returns a
clojure.lang.PersistentVector$ChunkedSeq, which is a chunked seq (an instance of
(seq '(1 2 3 4 5))simply returns a
clojure.lang.PersistentList, which is not.
From above, we see that if the collection is a chunked seq,
chunk-first to take elements from it. For performance reasons,
chunk-first takes 32 elements. Therefore,
prn is called 32 times.
We can see this in the following:
(take 1 (map prn (range 100)))
0 1 2 ... 31 (nil)
If the collection is not a chunked seq,
map realizes only one element of it.
To avoid chunking, we can explicity "unchunk" the lazy sequence:
(defn unchunk [s] (when (seq s) (lazy-seq (cons (first s) (unchunk (next s))))))
unchunk turns the collection into something that's not chunkable (a cons cell).
We see that:
(take 1 (map prn (unchunk (range 100))))
Laziness and side effects
In general, mixing laziness and side effects is a bad idea. It makes reasoning about when thing will be evaluated difficult.
If it has to be done, however, understanding chunking and when it occurs is important. Most of the time, the side effects are much costlier than printing a value.