DataStax Help Center

DSE startup removeUnfinishedCompactionLeftovers() fails with FSReadError and NullPointerException

Summary

This article provides a solution for situations where DataStax Enterprise fails to startup with a FSReadError and NullPointerException.

Symptoms

After DSE is shutdown, a subsequent restart fails with the following sample error in the system.log of a DSE 4.8.0 cluster node:

ERROR [main] 2015-12-30 01:42:00,653 CassandraDaemon.java:554 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:669) ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) ~[dse-core-4.8.0.jar:4.8.0]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.DseModule.main(DseModule.java:75) [dse-core-4.8.0.jar:4.8.0]
Caused by: java.lang.NullPointerException: null
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:661) ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
... 4 common frames omitted

Cause

This is a known error in CASSANDRA-10501 where not all temporary compaction files are renamed after a shutdown or server crash.

Since not all relevant files (particularly *-ka-*-Statistics.db file) have been renamed, startup fails when it tries to compute ancestors of SSTables which failed compactions.

Workaround

In order to allow DSE to startup successfully, the offending SSTable generation(s) will need to be removed from the data directory. To determine which SSTables did not complete compaction, follow these steps:

Step 1 - Enable Cassandra debug logging by setting the following lines in logback.xml:

 <root level="DEBUG">
<appender-ref ref="FILE" />
<appender-ref ref="STDOUT" />
</root>

Step 2 - Attempt to start DSE. Note that the startup will still fail since the previous step is only to determine which SSTables have failed compaction.

Here is an example entry from system.log:

DEBUG [main] 2015-12-30 05:43:30,148 MetadataSerializer.java:84 - No sstable stats for /var/lib/cassandra/data/music/albums/music-album-ka-5678 
ERROR [main] 2015-12-30 05:43:30,155 CassandraDaemon.java:579 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
...

Step 3 - Remove all the SSTables from the generation indicated in the logs. In the example above, delete all files from generation 5678.

Step 4 - Depending on the number of affected files, it may be necessary to repeat steps 2 and 3 until all offending files have been identified and removed.

Step 5 - Disable debug logging in step 1 to prevent performance and disk space issues on the node.

NOTE - It is necessary to restart DSE again for changes to logback.xml to take effect.

Step 6 - Run a repair on the node to ensure data is consistent with other replicas.

Solution

The fix for CASSANDRA-10501 has been included in DataStax Enterprise 4.7.5 and 4.8.2 so upgrade to the latest version as appropriate.

See also

DSE doc - Configuring logging levels in Cassandra 2.1

Cassandra JIRA -  CASSANDRA-10501 Don't try to get ancestors from half-renamed sstables

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk