Summary
A DSE node fails to bootstrap with a "divide by zero" exception thrown by a FlushWriter
thread in CassandraDaemon.java
.
Symptoms
DSE is unable to startup with the failure triggered while flushing a table to disk.
A sample exception stack trace from DSE 4.6 is shown below:
INFO [FlushWriter:1] 2015-05-19 02:56:59,134 Memtable.java (line 356) Writing Memtable-cfUsers@1739261327(10997452/109974520 serialized/live bytes, 12835 ops)
ERROR [FlushWriter:1] 2015-05-19 02:56:59,139 CassandraDaemon.java (line 199) Exception in thread Thread[FlushWriter:1,5,main]
java.lang.ArithmeticException: / by zero
at org.apache.cassandra.io.sstable.IndexSummaryBuilder.<init>(IndexSummaryBuilder.java:43)
at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.<init>(SSTableWriter.java:437)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:102)
at org.apache.cassandra.db.Memtable$FlushRunnable.createFlushWriter(Memtable.java:418)
at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:360)
at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:343)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Cause
In the sample stack trace above, the exception is thrown by the IndexSummaryBuilder.java class when it tries to calculate the expected index entries using the index interval which has a zero value:
public IndexSummaryBuilder(long expectedKeys, int indexInterval) { this.indexInterval = indexInterval; long expectedEntries = expectedKeys / indexInterval; <----- indexInterval has a zero value if (expectedEntries > Integer.MAX_VALUE) ...
The value for the index interval gets populated in SSTableWriter.IndexWriter():
...
IndexWriter(long keyCount) { indexFile = SequentialWriter.open(new File(descriptor.filenameFor(SSTable.COMPONENT_INDEX)), !metadata.populateIoCacheOnFlush()); builder = SegmentedFile.getBuilder(DatabaseDescriptor.getIndexAccessMode()); summary = new IndexSummaryBuilder(keyCount, metadata.getIndexInterval()); <----- indexInterval here bf = FilterFactory.getFilter(keyCount, metadata.getBloomFilterFpChance(), true); } ...
In Apache Cassandra 2.0, the CFMetaData.getIndexInterval() method returns the index_interval
value defined for a particular table. In this instance, the table was defined with the incorrect index_interval
value of zero.
Solution
Follow these steps to resolve this issue:
Step 1 - Determine which table has the incorrect property value. From the system.log
, search for the string indexInterval=0 which should return an entry similar to this example:
INFO [InternalResponseStage:1] 2015-05-19 02:45:47,321 DefsTables.java (line 309) Loading org.apache.cassandra.config.CFMetaData@3ea8eaa[cfId=d42a44e1-fc64-348e-99ca-52b2dcc1c481,ksName=appKS,cfName=cfUsers,cfType=Standard, ... ,compressionOptions={},bloomFilterFpChance=<null>,memtable_flush_period_in_ms=0,caching=KEYS_ONLY,defaultTimeToLive=0,speculative_retry=NONE,indexInterval=0,populateIoCacheOnFlush=false,droppedColumns={},triggers={},isDense=true]
Step 2 - Change the index interval property to the value as defined in your data model (default is 128):
cqlsh> ALTER TABLE "appKS.cfUsers" WITH index_interval = 128 ;
Step 3 - Delete the following directories. This will ensure that all their contents are cleaned out completely.
data
commitlog
saved_caches
Step 4 - Recreate the directories above and make sure Cassandra has full permissions.
Step 5 - Attempt to bootstrap the node again by starting DSE.
See also
For more information on the index_interval
property, see the CQL documentation on Table properties.