Full ERROR Message Example
ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2019-03-08 15:31:07,253 StorageService.java:434 - Stopping native transport
INFO [GossipStage:1] 2019-03-08 15:31:07,254 Gossiper.java:1812 - Announcing shutdown
INFO [PERIODIC-COMMIT-LOG-SYNCER] 2019-03-08 15:31:07,254 PluginManager.java:460 - Deactivating plugins which require native transport
INFO [GossipStage:1] 2019-03-08 15:31:07,255 StorageService.java:2684 - Node /2607:f160:8a02:805d:a0:111:0:102 state jump to shutdown
INFO [PERIODIC-COMMIT-LOG-SYNCER] 2019-03-08 15:31:07,256 PluginManager.java:470 - Deactivation of plugins which require native transport finished
INFO [PERIODIC-COMMIT-LOG-SYNCER] 2019-03-08 15:31:07,257 Server.java:175 - Stop listening for CQL clients
ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2019-03-08 15:31:07,293 CommitLog.java:512 - Failed to persist commits to disk. Commit disk failure policy is stop; terminating thread
org.apache.cassandra.io.FSWriteError: java.io.IOException: Input/output error
... skipped for readability ...
What does this ERROR message mean?
During DSE startup, some kind of disk issue has resulted in a mutation failing to be written to the commitlog. The underlying disk may be unmounted, corrupt, or simply full.
Why does this ERROR occur?
Sometimes it is preferable to prevent the node from being available but keep the node running, so the administrator can deal with the corrupted file, disk failure, etc, with the node offline and not serving application traffic. However, in most cases it is advisable to stop the node completely to avoid any backlog of mutations building up, so the default policy is STOP as in the example above.
You can configure the commit_failure_policy in cassandra.yaml to define what DSE should do when a write error is encountered during a write to the commitlog. There are multiple options, depending on how you want to handle the error, frequently the failure results in or is triggered by sstable corruption or missing sstables.
Configurable options
die
Shut down the node and kill the JVM, so the node can be replaced.
stop
Shut down the node, leaving the node effectively dead, available for inspection using JMX.
stop_commit
Shut down the commit log, letting writes collect but continuing to service reads. Use with caution
ignore
Ignore fatal errors and let the batches fail. Use with caution
Note: stop is the default behavior on a new DSE node.
How do you fix this ERROR?
There is no specific path to fix this error, as there may be multiple reasons why the error occurred, ie disk corruption, corrupted commitlog, etc. In the event of disk failure the resolution would be to replace the disk and re-bootstrap the node.
Note: While it may be tempting to update the policy as a workaround to get a node back up (ie switch to ignore), we would not recommend this.
If you are faced with this error and request further assistance, please raise a case with DataStax Support.