Full ERROR Message Example:
ERROR [main] xxxx-xx-xx xx:xx:xx,xxx CommitLogReplayer.java:397 - Replay stopped. If you wish to override this error and continue starting the node ignoring commit log replay problems, specify -Dcassandra.commitlog.ignorereplayerrors=true on the command line ERROR [main] xxxx-xx-xx xx:xx:xx,xxx JVMStabilityInspector.java:179 - JVM state determined to be unstable. Exiting forcefully due to: org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 1848666 of commit log /var/lib/cassandra/commitlog/CommitLog-6-1578895178085.log, with bad position but valid CRC"
What does this ERROR message mean?
This Error would be reported during a restart of a node where it tries to replay existing an commit log file as part of the startup process and fails if it encounters a corrupt commit log file.
Why does this ERROR occur?
The most common causes for this issue are:
- There is a hardware failure such as bad sectors on the disk
- The server crashed while the file system was in the middle of syncing the file to disk
- The Cassandra process got interrupted mid-transaction
- The commit log directory file system becomes full the commitlog didn't get completely written to disk
How do you fix this ERROR?
There are a couple of options to resolve the issue:
1. If there is only one corrupted commit log. You could move out/delete the reported corrupt commit log as in this example (/var/lib/cassandra/commitlog/CommitLog-6-1578895178085.log) and then restart the node. If no other corrupt commit log's reported, then the node should restart. Once the node has been restarted and is up and running, you will need to run repairs on the node (nodetool repair) to pick up the missing mutations which did not get replayed from the corrupt commit log.
2. If there are many corrupt commit log's. You can also set the mentioned JVM option -Dcassandra.commitlog.ignorereplayerrors=true and this would ignore the replay of any corrupted commit logs and will allow you to restart the node without having to identify each individual corrupt commit log and having to move out/delete if there are a large number of corrupt commit log's.
This setting would need to be added in the jvm.options file located in the default location /etc/dse/cassandra/ for package installation and default location installation_location/resources/cassandra/conf/ for taball installations.
Edit the jvm.options file and add the -Dcassandra.commitlog.ignorereplayerrors=true parameter as per example below:
######################
# STARTUP PARAMETERS #
######################
-Dcassandra.commitlog.ignorereplayerrors=true
Restart the node once the file has been updated. Then run repairs on the node (nodetool repair) to pick up the missing mutations which did not get replayed from the corrupt commit log.