Full ERROR Message Example
================= ERROR [CoreThread-0] 2020-09-09 09:10:54,909 VerbHandlers.java:77 - Unexpected error during execution of request READS.RANGE_READ (19673): /10.101.32.244 -> /10.101.32.244 org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/keyspace1/table1-46e38cb1c78f11ea86f8a7fd4edea859/aa-45-bti-Data.db at org.apache.cassandra.io.util.CompressedChunkReader$Standard.error(CompressedChunkReader.java:281) at org.apache.cassandra.io.util.CompressedChunkReader$Standard$1.completed(CompressedChunkReader.java:246) at org.apache.cassandra.io.util.CompressedChunkReader$Standard$1.completed(CompressedChunkReader.java:194) <snip> Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/keyspace1/table1-46e38cb1c78f11ea86f8a7fd4edea859/aa-45-bti-Data.db result = 13512): corruption detected, chunk at 39948101 of length 13226. at org.apache.cassandra.io.util.CompressedChunkReader$Standard$1.completed(CompressedChunkReader.java:200) ... 14 common frames omitted ... skipped for readability ...
What does this ERROR message mean?
DSE is trying to read an sstable from disk, and during the file open of the sstable has encountered some form of corruption in the sstable.
Why does this ERROR occur?
The process to open the sstable is
sstable = open(entry.getKey(), entry.getValue(), metadata);
During the open process, some kind of mismatch has occurred, for example an incomplete record, or an invalid key-value pair. An sstable flush may have been incomplete, for example if DSE crashed while writing an sstable. The problem may also be caused by some kind of hardware issue, such as disk corruption/ failure.
How to fix
Some analysis of the logs is required to identify any possible reason for corruption, ie if many sstables are corrupted there may be an underlying disk issue.
If you are satisfied the problem is with individual table(s) you should initially run a nodetool scrub against the sstable. Note that this process is performed online, so is multi-threaded.
If the online scrub doesn't resolve the corruption, your next option is to perform an offline scrub using sstablescrub. You will need to take DSE down to run this against the sstable. Because the process is run offline, it can perform a more thorough scrub of the sstable, so should be run if an online scrub fails. Be aware that the process is single-threaded, so will take longer than an online scrub. This has implications with larger sstables due to the node being offline for an extended period of time.
If the offline scrub still doesn't resolve the issue, your final option is to stop DSE, move the sstable out of the way, restart DSE and then perform a repair on that node against the column family the sstable belongs to. Ensure you have a sufficiently high RF before doing this to ensure no data loss will occur. Move the corrupted sstable to a different location rather than deleting it.