DataStax Help Center

Long GC times from GC failures

Long GC times from GC failures

“I keep seeing the following warnings in my logs, what should I do?”

INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 (line 122) GC for ConcurrentMarkSweep: 1835 ms for 3 collections, 2606015656 used; max is 10611589120
INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 (line 122) GC for ParNew: 9866 ms for 8 collections, 2910124308 used; max is 6358564864
INFO [ScheduledTasks:1] 2013-03-07 19:50:08,029 (line 122) GC for ParNew: 866 ms for 1 collections, 3480197610 used; max is 6358564864

The first thing to do when trying to debug issues with Garbage Collection is to turn on GC logging. Un-comment the following lines in your and then restart the service.

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"

If you are on JDK 6u34 or later, or 7u2 or later, you should enable GC log rotation

JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation"
JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10"

Otherwise, stick the date in the file name (but you should really upgrade your JVM if you are on a version less then JDK 6u34). GC logs can get pretty big, so if you can’t enable rolling, make sure to remove old ones, or put them somewhere with a lot of space.

JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"

Now that you have GC logs, you want to look through them and see what is really going on. For an indepth look at Garbage Collection, you can read the following articles. This article is only going to scratch the surface.

Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning

Everything I Ever Learned About JVM Performance Tuning @Twitter

Promotion Failure? Concurrent Mode Failure?

Lots of other GC links

The first thing to do is find the places in the GC logs where the collections are taking a long time. Find a line which says:

Total time for which application threads were stopped: X.XXXXX seconds

Where X.XXX is more then 1 second. (Though ignore that as the first lines in the file, there is usually at least one large GC right at startup.)

Once you find that line, look above it for a bunch of stuff in {}’s. Most likely you will find an error message about “ParNew (promotion failed)” or CMS (concurrent mode failure)” in there.

ParNew (promotion failed)

ParNew promotion failure means that there wasn’t a chunk of memory free in the Old Gen space large enough to hold an object which was attempting to be promoted out of the Young Gen space. One of the largest causes of this issue is having columns which are too large and so don’t get put into the arena allocator[1]. For writes larger then 128k, Cassandra doesn’t use the arena allocator, so over time, unless your columns are all the exact same larger then 128k size, this will cause CMS fragmentation.

The following settings can sometimes help with promotion failures2:


CMS (concurrent mode failure)

CMS failure means that the JVM thinks the current collection won’t finish before the heap becomes full. This usually results from objects which are promoted from Young Gen to Old Gen, and then being freed. Some ways to fix this are:

  • Lower your “-XX:CMSInitiatingOccupancyFraction” so that CMS kicks in earlier
  • Reduce your total heap size.
  • Increase the size of your Young Gen, so that objects will have more time in Young Gen, and thus a larger chance of being cleaned up before being promoted.
  • If you have a small number of cores, enable incremental CMS “-XX:+CMSIncrementalMode”

  1. Information on the arena allocator can be found in the “Writes” section of this blog post and in CASSANDRA–2252.  ↩

Was this article helpful?
5 out of 5 found this helpful
Have more questions? Submit a request


Powered by Zendesk