DataStax Help Center

Avoiding out of memory exceptions with large queries against DSE thrift server

Summary

The thrift server was failing with a JVM Out of memory error either heap space of GC overhead due to a full table request from a client application

Symptoms

The following errors may be observed in the thrift.log

2017-04-11 00:41:55,017 org.apache.spark.util.Utils: Uncaught exception in thread task-result-getter-0 
        java.lang.OutOfMemoryError: GC overhead limit exceeded 

2017-04-11 15:01:52,277 org.apache.spark.util.Utils: Uncaught exception in thread task-result-getter-0
    java.lang.OutOfMemoryError: Java heap space

Cause

The problem was due to a client application needing to pull the entire contents of a given table from DSE. Note: while this is not a common action there may well be certain scenarios where this is necessary.

Spark will run all tasks in the job in parallel, because of this the data set will not fit into the JVM heap memory and cause these out of memory conditions.

Workaround

There is a way to configure the thrift server so tasks in spark so they only run one at a time, while this is slower it means that they run in an incremental fashion. The following parameter may be invoked on the thrift server command line

--conf spark.sql.thriftServer.incrementalCollect=true

Solution

The user must think about other ways to pull back a smaller data set from DSE using the power of the distributed computation that spark offers to do the “heavy lifting” on the cluster itself rather than in some upstream client application.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk