Posted on 2016-09-17 Edit on GitHub
Now that I'm programming Clojure professionally, I wanted to learn more about Java and the JVM. Much of the work we do involves Java interop with third-party libraries.
Creating thin wrappers around Java libraries is a common practice in the Clojure community, which is too small to write everything it needs in Clojure (especially when there are Java libraries readily available). What's more, writing Java libraries allows sharing with other JVM languages like Scala.
I started by tackling an topic about which I knew very little: the JVM garbage collector (GC). I knew that the GC is a program that occasionally runs to free memory, and thus relieve programmers of the burden of memory management, but little beyond that. I suspect that most developers are at the same level: they know that the GC exists and what it's supposed to do, but not the specifics of how it goes about doing it.
The way the JVM manages memory is not as complex as I thought. Basically, memory is divided into several regions:
- Young generation
- Region more frequently GC'ed
- Old (tenured) generation
- Region less frequently GC'ed
- Where objects that survive a number of GCs above some threshold are copied.
- Permanent generation / metaspace
- Where objects are never GC'ed
- Permanent generation is replaced by metaspace in Java 8.
The young generation is further divided into several spaces:
- Eden space
- Where new objects are allocated
- Survivor spaces
- Usually two, where objects that survive GC are copied before being promoted to old generation
The high-level memory allocation-deallocation process goes something like this:
- A new object is allocated to Eden space
- Some time later, minor collection begins in the young generation
- Starting from a set of root references, the GC follows references and marks those that it finds as "live" references.
- Objects referred to by live references are copied to a survivor space (call it \( S_1 \))
- Objects in other survivor space (call it \( S_2 \)) are copied to \( S_1 \)
- Objects that survive GC have their "survival counter" increased by 1
- Memory is swept clean in Eden space and \( S_2 \)
- Survivor spaces are swapped, so that the next round of GC copies from \( S_1 \) to \( S_2 \)
- Objects whose survival counter is above a certain threshold are copied the old generation
- The heuristic is that an object that survives a few GCs is stick around for a long time, and therefore does not need to be checked for GC each time
- If the old generation has too many objects, major collection beings
- For Oracle's JVM, major collection is always followed by a minor collection; the combination is a full GC
JVM has a number of GC algorithms available:
- Parallel for young space, serial for old space
- Parallel for young and old space
- Concurrent mark-sweep (CMS) with serial young space
- Concurrent mark-sweep with parallel young space
There are different trade-offs between these GCs, which are explained in detail by Kevin and Oracle.
Overall, I learned a lot in 2 hours' worth of lessons. It would be nice if Kevin had mentioned how other programming languages do GC (reference counting was mentioned in the context of COM), if there are superior algorithms available, etc., but I can't really complain, since the course is clearly JVM-specific.