Full ERROR Message Example:
ERROR [Native-Transport-Requests-477] 2020-08-12 14:19:40,354 JVMStabilityInspector.java:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:695) ~[na:1.8.0_252] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.8.0_252] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_252] at org.apache.cassandra.utils.memory.BufferPool.allocate(BufferPool.java:114) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.utils.memory.BufferPool.access$1000(BufferPool.java:50) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.utils.memory.BufferPool$LocalPool.allocate(BufferPool.java:408) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:335) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:126) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:98) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.io.util.BufferManagingRebufferer.<init>(BufferManagingRebufferer.java:45) ~[apache-cassandra-3.11.6.jar:3.11.6]
What does this ERROR message mean?
The DSE process has consumed all of the memory allocated to it by the JVM. There are many root causes which can cause Java out of memory, such as large result sets, high application load, using field cache in the Solr queries, bad queries, etc. This leads to the JVM to throw a more specific error about exactly how it is failing. The subsequent error from the JVM will provide more useful information.
Why does this ERROR occur?
This error occurs when the DSE process is consuming too much heap. This can be a result of a large or inefficient workload, as a result of a configuration error, or due to an under-provisioned cluster. Heap tuning will vary per cluster and workload, but if it is not performed correctly, you will likely run into one of these errors.
There are 5 JVM errors that almost always follow this initial message:
java.lang.OutOfMemoryError: Direct buffer memory
This can be caused when there are too many netty connections and the native memory eventually is exhausted
This can be caused by lots of large requests that require more memory and eventually exhaust the pool
This can sometimes occur when loading a very large snapshot that requires more memory than provided by the default sstableloader heap settings
This can be a result of a known issue with the java.nio cache as detailed in this article. https://support.datastax.com/hc/en-us/articles/360000863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache
java.lang.OutOfMemoryError: GC overhead limit exceeded
The java.lang.OutOfMemoryError: GC overhead limit exceed means JVM can't clean up enough space during GC within several GC cycles. It could happen when the heap size on the application didn't accommodate the needs; or there was memory leak on your code implementation, and so on.
This can be caused when a solr query returns too large of a result. It can be corrected with pagination.
This can show up in datastax agents when they don't have enough memory.
java.lang.OutOfMemoryError: unable to create new native thread
The unable to create new native thread error means that the JVM is limited by user resource limits and unable to create the necessary number of processes, threads, or open files it requires to function.
https://support.datastax.com/hc/en-us/articles/204225939-Errors-in-Apache-Cassandra-DSE-and-OpsCenter-Due-to-Insufficent-Resource-Limit-Settings-
java.lang.OutOfMemoryError: Java heap space
This error will occur when the JVM does not have sufficient heap allocated for the operations it is currently attempting. By default, a .hprof file will be generated which can be reviewed in detail to determine exactly what consumed all of the heap. By default, the database puts the hprof file in a subdirectory of the working, root directory when running as a service.
You can change the hprof directory following the directions here:
https://docs.datastax.com/en/ddac/doc/datastax_enterprise/config/configHeapDump.html
java.lang.OutOfMemoryError: Map failed
This error occurs when the JVM attempts to map a large file to memory. This usually occurs due to incorrect max_map_count settings rather than large maps.
How do you fix this ERROR?
For all of these error messages, you can try increasing the amount of memory allocated to the cassandra heap if your system can handle it. More often than not, you will need a more specific fix.
If the JVM error message is java.lang.OutOfMemoryError: Direct buffer memory, we can try the following:
1) Disable AIO
2) -Dsun.nio.PageAlignDirectMemory=true from jvm.options
3) Increase the amount of DirectMemory in the cassandra-env.sh
JVM_OPTS="$JVM_OPTS -XX:MaxDirectMemorySize=1024M"
4) Ensure the cassandra-env.sh. has this code block:
memory_remaining_in_mb="$((${system_memory_in_mb} - ${heap_size_in_mb}))" # Calculate direct memory as 1/2 of memory available after heap: MAX_DIRECT_MEMORY="$((memory_remaining_in_mb / 2))M" JVM_OPTS="$JVM_OPTS -XX:MaxDirectMemorySize=$MAX_DIRECT_MEMORY"
If this issue occurs when running sstableloader, you can run with a higher heap size. The default heap setting for sstsableloader is 2G. To increase the amount memory, run
export MAX_DIRECT_MEMORY="8G"
before running the sstableloader
If the JVM error message is java.lang.OutOfMemoryError: GC overhead limit exceeded, we can try the following:
This error is often associated with large solr queries or spark jobs. At a high level, the best fix is to reduce the size of the query or amount of data moving through the spark job at the same time.
If the JVM error message is java.lang.OutOfMemoryError: unable to create new native thread, we can try the following:
1) Ensure disk_access_mode: mmap_index_only is set.
2) Ensure user resource limits are set up per recommended production settings:
https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/config/configRecommendedSettings.html#Setuserresourcelimits
If the JVM error message is java.lang.OutOfMemoryError: Java heap space, we can try the following:
Increase heap size if possible.
Analyze the heap dump to determine what's consuming so much heap and address the root cause.
If the JVM error message is java.lang.OutOfMemoryError: Map failed, we can try the following:
Check the output of sysctl -a and ensure the value of vm.max_map_count is set to 1048575 as detailed in our recommended production settings document.
https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/config/configRecommendedSettings.html#Setuserresourcelimits