ERROR Message Example
ERROR [RMI TCP Connection(32971)-127.0.0.1] 2020-03-10 15:39:38,654 StorageService.java:1536 - Error while rebuilding node org.apache.cassandra.streaming.StreamException: Stream failed
TYPICALLY WOULD PRODUCE A STACK TRACE LOOKS LIKE BELOW:
org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:493) at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:583) at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:326) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
What does this ERROR message mean?
This error means that the node rebuilding process has failed to complete and is seen if the data streaming process abruptly ends while rebuilding a node.
Why does this ERROR occur and how to fix it?
This error typically occurs due to three possible exceptions that could be associated with this error: IllegalArgumentException, ExecutionException and RuntimeException.
IllegalArgumentException
This means some data validation errors have occurred and you should be able to see details reported in the exception, here are two examples:
Buffer.limit
java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267)
indicates that there are problems with the JVM NIO buffer configuration. This buffer could be set too small or to an invalid value.
The JVM NIO buffer configuration is set in the jvm.options file in this section:
-Djdk.nio.maxCachedBufferSize=<value>
The default value for this parameter is 1M (1,048,576). To fix, you can try doubling this value to 2M (2,097,152), then restart DSE on the node and retry the node rebuild operation.
Unknown type 0
java.lang.IllegalArgumentException: Unknown type 0
indicates that there are potential schema changes occurring while the node rebuild process is active.
To fix, you can stop the rebuilding process by stopping DSE on the node that is rebuilding. Next check your schema is in agreement. You can check using the nodetool command:
nodetool describecluster
check the Schema versions section and make sure every node has the same schema version. If only one schema version exists, restart the node rebuild process by starting DSE on the node, and restart the nodetool rebuild process.
ExecutionException
These are the exceptions thrown through JMX during the rebuild process. You will need to check the reason why these exceptions are thrown from the log, for example:
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 1, expected write size = 2904846805
shows that the cause for this ExecutionException is actually a RuntimeException which is caused by not enough space left on the disk for compaction. To fix, you need to allow more disk space for compaction to occur on the rebuilding node. You could either delete some unused files such as old snapshots to gain more disk space, or allocate more disk spaces and restart the nodetool rebuild process.
RuntimeException
These are the exceptions thrown during the JVM runtime for the rebuild process, you need to check the specific reason why these RuntimeException are thrown from the log, for example:
error: Error while rebuilding node: Stream failed -- StackTrace -- java.lang.RuntimeException: Error while rebuilding node: Stream failed
indicates that streaming has failed during the rebuild process. You need to further check why stream has failed from the system.log, the system.log may show which streaming session has failed on which node, for example:
ERROR [STREAM-IN-/10.73.103.5] 2019-11-23 00:13:11,101 StreamSession.java:534 - [Stream #54623740-0da1-11ea-93ac-c51d3fcd12fe] Streaming error occurred
this tells us that stream session 54623740-0da1-11ea-93ac-c51d3fcd12fe from node 10.73.103.5 has failed. To review, you need to go through the system.log and debug.log from node 10.73.103.5 and check for what caused the failure for the stream session 54623740-0da1-11ea-93ac-c51d3fcd12fe, for example:
DEBUG [STREAM-OUT-/10.73.103.1:33252] 2019-11-23 00:13:11,101 ConnectionHandler.java:408 - [Stream #54623740-0da1-11ea-93ac-c51d3fcd12fe] Error org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe …
indicates that when streaming to the rebuilding node 10.73.103.1, the source node 10.73.103.5 has encountered a network issue that caused the IOException: Broken pipe, this shows a connection break at the network level.
If you encounter the Error while rebuilding node error and none of the above situations are applicable or the above recommendations do not fix this issue, please contact the DataStax Support team.