Surprises with side effects

One of Clojure's more surprising behaviors is the interaction between functions like map 1 and functions with side effects2.

Let's map a side-effecting function like prn over a list3:

(take 1 (map prn '(1 2 3 4 5)))

You may expect that, since there is a (take 1 ...), only one number will be printed. This is indeed the case:

1
(nil)

But try doing the same to a vector:

(take 1 (map prn [1 2 3 4 5]))

We get:

1
2
3
4
5
(nil)

What's going on?

Chunking

In the definition of take, seq is called on the collection. seq causes a lazy sequence to realize its value, i.e. the (map prn ...) is evaluated.

Now we need to take a look at the definition of map; it is sufficient to look at one arity:

([f coll]
   (lazy-seq
    (when-let [s (seq coll)]
      (if (chunked-seq? s)
        (let [c (chunk-first s)
              size (int (count c))
              b (chunk-buffer size)]
          (dotimes [i size]
              (chunk-append b (f (.nth c i))))
          (chunk-cons (chunk b) (map f (chunk-rest s))))
        (cons (f (first s)) (map f (rest s)))))))

The difference is that [1 2 3 4 5] goes down the chunked-seq? path, while (1 2 3 4 5) does not.

  • (seq [1 2 3 4 5]) returns a clojure.lang.PersistentVector$ChunkedSeq, which is a chunked seq (an instance of IChunkedSeq).
  • (seq '(1 2 3 4 5)) simply returns a clojure.lang.PersistentList, which is not.

From above, we see that if the collection is a chunked seq, map uses chunk-first to take elements from it. For performance reasons, chunk-first takes 32 elements. Therefore, prn is called 32 times.

We can see this in the following:

(take 1 (map prn (range 100)))

which outputs:

0
1
2
...
31
(nil)

If the collection is not a chunked seq, map realizes only one element of it.

Avoiding chunking

To avoid chunking, we can explicity "unchunk" the lazy sequence:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))

In short, unchunk turns the collection into something that's not chunkable (a cons cell).

We see that:

(take 1 (map prn (unchunk (range 100))))

produces

0
(nil)

Laziness and side effects

In general, mixing laziness and side effects is a bad idea. It makes reasoning about when thing will be evaluated difficult.

If it has to be done, however, understanding chunking and when it occurs is important. Most of the time, the side effects are much costlier than printing a value.

Footnotes:

1

And concat, reduce, filter, and many others.

3

For some interactive examples, see Ian Truslove's presentation.