This article discusses the downside of forcing major compaction on a table and best practice recommendation.
When forcing a major compaction on a table configured with the
SizeTieredCompactionStrategy (STCS), all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster's performance.
We understand that cluster administrators use major compaction as a way of evicting tombstones which have accumulated as a result of high-delete workloads which in most cases is due to an incorrect data model.
The recommendation in this article does not constitute a solution to the underlying issue customers face. It should not be considered a long-term fix to the data model problem.
In Apache Cassandra 2.2, CASSANDRA-7272 introduced a huge improvement which splits the output of
nodetool compact into multiple files which are 50% then 25% then 12.5% of the original table size until the smallest chunk is 50MB for tables using STCS.
When using major compaction as a last resort for evicting tombstones, use the
--split-output (or shorthand
-s) to take advantage of this new feature:
$ nodetool compact --split-output -- <keyspace> <table>
NOTE - This feature is only available from DataStax Enterprise 5.0+ which ships with Cassandra 3.0. It is not available in earlier versions such as DSE 4.8 which shipped with Cassandra 2.1.