Example ERROR message
ERROR [STREAM-IN-/172.31.47.213] 2016-04-19 12:30:28,531 StreamSession.java:621 - [Stream #743d44e0-060e-11e6-985c-c1820b05e9ae] Remote peer 172.31.47.213 failed stream session.
What does this ERROR message mean?
The message indicates a failure while streaming data, either during bootstrap, repair, or any other operations causing streaming, which may include sstableloader restores.
Why does this ERROR occur?
Unfortunately, the error indicates a streaming issue with the node specified in the alert, but the error could be for any number of reasons that can only be discerned by analyzing the logs from the node specified.
How do you fix this ERROR?
Pull the logs from the node specified in the error message to see what was happening around the time of the error. Was there a long GC pause, or maybe some pending threads shown in StatusLogger? Was there a network issue, or some other outside issue that could have caused a problem communicating? There should be some indication in the system and/or debug.log files indicating why the transmission failed.
For example:
/var/log/cassandra$ grep -A10 'ERROR.*fe6ff550-4488-11e6-85be-e77a66119977' system-*.log
ERROR [STREAM-IN-/10.64.48.131] 2016-07-08 04:52:21,348 StreamSession.java:524 - [Stream #fe6ff550-4488-11e6-85be-e77a66119977] Streaming error occurred
java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) ~[na:1.8.0_91]
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_91]
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.8.0_91]
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53) ~[apache-cassandra-2.2.6.jar:2.2.6]
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268) ~[apache-cassandra-2.2.6.jar:2.2.6]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
WARN [CompactionExecutor:1014] 2016-07-08 04:57:00,626 BigTableWriter.java:184 - Writing large partition system/hints:c42d329c-68d5-4d1b-b886-90aa752fb288 (459981414 bytes)
WARN [HintedHandoffManager:1] 2016-07-08 04:57:01,777 HintedHandoffMetrics.java:80 - /10.192.32.125 has 1150 dropped hints, because node is down past configured hint window.
INFO [STREAM-IN-/10.64.48.131] 2016-07-08 05:04:45,446 StreamResultFuture.java:182 - [Stream #fe6ff550-4488-11e6-85be-e77a66119977] Session with /10.64.48.131 is complete
In this example, we can see that the node is writing a large partition, and during the stream, the socket timed out. Thus, in this case, there may be an issue with the sstable file size, since by default the timeout value is 1 hour for file streaming. You can try increasing streaming_socket_timeout_in_ms in the cassandra.yaml configuration file to a large value (eg: 72000000 ms or 20 hours).