Generational Garbage Collection
Generational Garbage Collection
Satisfying the
intellectual curiosity as a software engineer would be a valid cause, but also, understanding how
GC works can help you write much better Java applications.
This is a very personal and subjective opinion of mine, but I believe that a person well versed in
GC tends to be a better Java developer. If you are interested in the GC process, that means you
have experience in developing applications of certain size. If you have thought carefully about
choosing the right GC algorithm, that means you completely understand the features of the
application you have developed. Of course, this may not be common standards for a good
developer. However, few would object when I say that understanding GC is a requirement for
being a great Java developer.
This is the first of a series of "Become a Java GC Expert" articles. I will cover the GC
introduction this time, and in the next article, I will talk about analyzing GC status and GC
tuning examples from NHN.
The purpose of this article is to introduce GC to you in an easy way. I hope this article proves to
be very helpful. Actually, my colleagues have already published a few great articles on Java
Internals which became quite popular on Twitter. You may refer to them as well.
Returning back to Garbage Collection, there is a term that you should know before learning
about GC. The term is "stop-the-world." Stop-the-world will occur no matter which GC
algorithm you choose. Stop-the-world means that the JVM is stopping the application from
running to execute a GC. When stop-the-world occurs, every thread except for the threads
needed for the GC will stop their tasks. The interrupted tasks will resume only after the GC task
has completed. GC tuning often means reducing this stop-the-world time.
References from old objects to young objects only exist in small numbers.
These hypotheses are called the weak generational hypothesis. So in order to preserve the
strengths of this hypothesis, it is physically divided into two - young generation and old
generation - in HotSpot VM.
Young generation: Most of the newly created objects are located here. Since most objects soon
become unreachable, many objects are created in the young generation, then disappear. When
objects disappear from this area, we say a "minor GC" has occurred.
Old generation: The objects that did not become unreachable and survived from the young
generation are copied here. It is generally larger than the young generation. As it is bigger in size,
the GC occurs less frequently than in the young generation. When objects disappear from the old
generation, we say a "major GC" (or a "full GC") has occurred.
Let's look at this in a chart.
generation, it is recorded in this table. When a GC is executed for the young generation, only this
card table is searched to determine whether or not it is subject for GC, instead of checking the
reference of all the objects in the old generation. This card table is managed with write barrier.
This write barrier is a device that allows a faster performance for minor GC. Though a bit of
overhead occurs because of this, the overall GC time is reduced.
There are 3 spaces in total, two of which are Survivor spaces. The order of execution process of
each space is as below:
1. The majority of newly created objects are located in the Eden space.
2. After one GC in the Eden space, the surviving objects are moved to one of the Survivor
spaces.
3. After a GC in the Eden space, the objects are piled up into the Survivor space, where
other surviving objects already exist.
4. Once a Survivor space is full, surviving objects are moved to the other Survivor space.
Then, the Survivor space that is full will be changed to a state where there is no data at
all.
5. The objects that survived these steps that have been repeated a number of times are
moved to the old generation.
As you can see by checking these steps, one of the Survivor spaces must remain empty. If data
exists in both Survivor spaces, or the usage is 0 for both spaces, then take that as a sign that
something is wrong with your system.
The process of data piling up into the old generation through minor GCs can be shown as in the
below chart:
the size of the object is suitable for the Eden space. If the said object seems right, it will be
placed in the Eden space, and the new object goes on top. So, when new objects are created, only
the lastly added object needs to be checked, which allows much faster memory allocations.
However, it is a different story if we consider a multithreaded environment. To save objects used
by multiple threads in the Eden space for Thread-Safe, an inevitable lock will occur and the
performance will drop due to the lock-contention. TLABs is the solution to this problem in
HotSpot VM. This allows each thread to have a small portion of its Eden space that corresponds
to its own share. As each thread can only access to their own TLAB, even the bump-the-pointer
technique will allow memory allocations without a lock.
This has been a quick overview of the GC in the young generation. You do not necessarily have
to remember the two techniques that I have just mentioned. You will not go to jail for not
knowing them. But please remember that after the objects are first created in the Eden space, and
the long-surviving objects are moved to the old generation through the Survivor space.
Serial GC (-XX:+UseSerialGC)
The GC in the young generation uses the type we explained in the previous paragraph. The GC
in the old generation uses an algorithm called "mark-sweep-compact."
1. The first step of this algorithm is to mark the surviving objects in the old generation.
2. Then, it checks the heap from the front and leaves only the surviving ones behind
(sweep).
3. In the last step, it fills up the heap from the front with the objects so that the objects are
piled up consecutively, and divides the heap into two parts: one with objects and one
without objects (compact).
The serial GC is suitable for a small memory and a small number of CPU cores.
Parallel GC (-XX:+UseParallelGC)
CMS GC (-XX:+UseConcMarkSweepGC)
You need to carefully review before using this type. Also, if the compaction task needs to be
carried out because of the many memory fragments, the stop-the-world time can be longer than
any other GC types. You need to check how often and how long the compaction task is carried
out.
G1 GC
Finally, let's learn about the garbage first (G1) GC.
In this issue, we have only glanced at the GC for Java. Please look forward to our next issue,
where I will talk about how to monitor the Java GC status and tune GC.
10