DataStax Help Center

High number of connections can exhaust available native threads for Cassandra JVM

Summary

This article discusses an issue where the JVM can run out of native threads as a result of a high number of connections to the cluster.

Symptoms

In situations where the is a high number of client connections to the Cassandra client (thrift) port (default is 9160), a node under a lot of pressure can start to report lots of connection errors in the system.log like this:

ERROR [Thrift:36300] 2017-04-21 08:06:37,938  CustomTThreadPoolServer.java:224 - Error occurred during processing of message.
java.lang.RuntimeException: Failed to open server transport: unknown
        at com.datastax.bdp.transport.server.TNegotiatingServerTransport$Factory.getTransport(TNegotiatingServerTransport.java:507) ~[dse-4.7.5.jar:4.7.5]
        at com.datastax.bdp.transport.server.TNegotiatingServerTransport$Factory.getTransport(TNegotiatingServerTransport.java:395) ~[dse-4.7.5.jar:4.7.5]
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:197) ~[cassandra-all-2.1.11.908.jar:4.7.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) ~[libthrift-0.9.3.jar:0.9.3]
        at com.datastax.bdp.transport.server.TPreviewableTransport.readUntilEof(TPreviewableTransport.java:66) ~[dse-4.7.5.jar:4.7.5]
        at com.datastax.bdp.transport.server.TPreviewableTransport.preview(TPreviewableTransport.java:42) ~[dse-4.7.5.jar:4.7.5]
        at com.datastax.bdp.transport.server.TNegotiatingServerTransport.open(TNegotiatingServerTransport.java:174) ~[dse-4.7.5.jar:4.7.5]
        at com.datastax.bdp.transport.server.TNegotiatingServerTransport$Factory.getTransport(TNegotiatingServerTransport.java:499) ~[dse-4.7.5.jar:4.7.5]
        ... 5 common frames omitted
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:209) ~[na:1.8.0_60]
        at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_60]
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[na:1.8.0_60]
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[na:1.8.0_60]
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_60]
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ~[libthrift-0.9.3.jar:0.9.3]
        ... 9 common frames omitted

A DSE node can get overwhelmed and eventually stop processing requests. The node may appear down or unresponsive to other nodes and/or clients, with an  OutOfMemoryError reported in the system.log:

ERROR [Thread-11] 2017-04-21 08:26:02,850  CassandraDaemon.java:227 - Exception in thread Thread[Thread-11,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method) ~[na:1.8.0_60]
        at java.lang.Thread.start(Thread.java:714) ~[na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) ~[na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) ~[na:1.8.0_60]
        at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113) ~[cassandra-all-2.1.11.908.jar:4.7.5]
        at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:137) ~[cassandra-all-2.1.11.908.jar:4.7.5]

Cause

Each client connection results in a thread being allocated from the pool of available threads. When a request is completed, the thread is released back to the pool for subsequent requests.

A sustained high number of connections can eventually exhaust the available resources if the node is unable to keep up with the requests and the node becomes unresponsive to new requests.

Solution

Ensure that the nodes are configured as per the Recommended production settings for Apache Cassandra. In particular, ensure that the maximum number of processes is set to 32768.

Monitor the connections coming into each node using utilities such as netstat, for example:

$ netstat -an | grep 9160

 

Validate the source clients to ensure the connections are as expected, i.e. application-related and not a rogue process.

Finally, if the number of client connections consistently exceed the total number of connections available in the cluster, consider scaling out the cluster by adding more nodes to increase capacity.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk