Summary
Following a node failure, DSE is unable to start and the system log shows the following error:
ERROR [main] 2019-09-10 06:53:54,116 CassandraDaemon.java:854 - Exception encountered during startup java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input length = 1
Applies to
- DataStax Enterprise (DSE) 6.7
- DataStax Enterprise 6.0
- DataStax Enterprise 5.1
Symptoms
After restarting a node which previously failed, an error is thrown just after the list of tokens from other nodes is received, without further information as to the cause of the issue.
The full stack found in both system.log
and debug.log
is as follows:
ERROR [main] 2019-09-10 06:53:54,116 CassandraDaemon.java:854 - Exception encountered during startup java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input length = 1 at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:638) at org.apache.cassandra.db.lifecycle.LogReplica.readLines(LogReplica.java:74) at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.cassandra.db.lifecycle.LogReplicaSet.readRecords(LogReplicaSet.java:105) at org.apache.cassandra.db.lifecycle.LogFile.verify(LogFile.java:160) at org.apache.cassandra.db.lifecycle.LogTransaction$LogFilesByName.removeUnfinishedLeftovers(LogTransaction.java:479) at java.util.stream.ReferencePipeline$3$accept(ReferencePipeline.java:193) at java.util.HashMap$EntrySpliterator.tryAdvance(HashMap.java:1720) at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230) at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:521) at org.apache.cassandra.db.lifecycle.LogTransaction$LogFilesByName.removeUnfinishedLeftovers(LogTransaction.java:472) at org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:439) at org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:431) at org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:553) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:676) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:282) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:524) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:694) at com.datastax.bdp.DseModule.main(DseModule.java:96) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at java.nio.file.Files.readAllLines(Files.java:3205) at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:631) ... 32 common frames omitted
Cause
As part of the normal compaction process or memtable flush to sstables, temporary files will be created inside the data file directory. Normally these files are short lived and never remain in place, and once a compaction or flush is completed, these are cleaned up.
However, if an issue occur during the operation of writing these files or during compaction (e.g.: hardware failure), it can lead to corrupted files remaining in the data directory and causing DSE to fail on next startup.
Solution
WARNING Depending on how long the node was down and the smallest gc_grace_period
of the tables, it might be recommended to rebuild the node to avoid resurrecting deleted data rather than using the following solution. Consider this carefully before bringing the node back in the cluster.
To resolve the issue, it is necessary to first identify in which keyspace/table the problem occur.
- Add the following logger to
logback.xml
(e.g. for a package install:/etc/dse/cassandra/logback.xml
:
<logger name="org.apache.cassandra.db.ColumnFamilyStore" level="TRACE"/>
- Start DSE:
$ sudo service dse start
. As per previous occurrences, it should fail to start. - Go to the
debug.log
and identify the table that is causing the problem:
TRACE [main] 2019-09-10 08:47:06,063 ColumnFamilyStore.java:675 - Removing temporary or obsoleted files from unfinished operations for table <TABLE_NAME> ERROR [main] 2019-09-10 08:47:06,217 CassandraDaemon.java:854 - Exception encountered during startup java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input length = 1
- Go to the data directory of DSE on that node, as defined in the
cassandra.yaml
parameterdata_file_directories
. Example for a package install:
$ grep -A 5 data_file_directories /etc/dse/cassandra/cassandra.yaml
$ ls -ltr <data_file_directories>/<keyspace>/<table>-<id>
- The latest .db files and a
txn_compact-<id>.log
ortxn_flush-<id>.log
should show at the bottom of the list as per the example below (note the prefix `aa` is shown as an example for DSE 6.x and might change on earlier versions). Move these files outside of the<data_files_directories>
directory:
-rw-r--r-- 1 cassandra cassandra 296 May 14 23:22 aa-txn_flush_<id>.log -rw-r--r-- 1 cassandra cassandra 296 May 14 23:22 aa-5-bti-Rows.db -rw-r--r-- 1 cassandra cassandra 60 May 14 23:22 aa-5-bti-Partitions.db -rw-r--r-- 1 cassandra cassandra 16 May 14 23:22 aa-5-bti-Filter.db -rw-r--r-- 1 cassandra cassandra 10 May 14 23:22 aa-5-bti-Digest.crc32 -rw-r--r-- 1 cassandra cassandra 33191 May 14 23:22 aa-5-bti-Data.db -rw-r--r-- 1 cassandra cassandra 55 May 14 23:22 aa-5-bti-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 5238 May 14 23:22 aa-5-bti-Statistics.db -rw-r--r-- 1 cassandra cassandra 94 May 14 23:22 aa-5-bti-TOC.txt
- Start DSE:
$ sudo service dse start
- When DSE is up and running:
a. If using NodeSync, it will take care of the keyspaces/tables under its supervision without further interaction. Usenodetool repair
for the other keyspaces/tables.
b. If using OpsCenter repair service, disable repair if it’s ongoing in OpsCenter and run a full manual repair