Summary
In some instances, an exception is encounter during startup of Apache Cassandra, in particular when attempting to read a temporary (*-tmp-*
) data file.
Symptoms
During startup, CassandraDaemon.java returns an exception while reading a "tmp" file.
Here is a sample exception stack trace from DSE 4.6.1:
ERROR [main] 2015-05-14 22:08:52,560 CassandraDaemon.java (line 513) Exception encountered during startup FSReadError in /var/lib/cassandra/data/appKS/userCF-tmp-jb-2-Statistics.db at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:549) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:285) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:423) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:641) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:334) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:462) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432) at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:545) ... 4 more INFO [Thread-2] 2015-05-14 22:08:52,564 DseDaemon.java (line 502) DSE shutting down...
Cause
Temporary data files are created during normal compaction operations. In this instance, the temp file was leftover from a failed compaction. The compaction failure could be the result of the following:
- unclean shutdown of DSE, e.g. forced kill
- unexpected node outage, e.g. hardware failure or power outage
- server reboot without DSE being shutdown
In addition to the above, the offending files have a zero (0) byte size resulting in java.io.EOFException
.
Workaround
Follow the steps below to recover the node:
Step 1 - Delete the offending file and its associated files. In the example above, the file is userCF-tmp-jb-2-Statistics.db
so delete all "sequence 2" files (userCF-tmp-jb-2-*.db
).
Step 2 - Start DSE.
Solution
Where a node outage is planned, ensure that DSE is shutdown cleanly before shutting down or rebooting the node.
For unplanned outages, recover the node by following the steps in the Workaround above.