FAQ: Is it necessary to keep half of the disk empty for STCS?

The figure is quoted as a worst case scenario.  In order to need that much space we need to meet the following circumstances:

  1. There is only one table.
  2. All the sstables are merged into two super sstables.
  3. Those sstables are then merged together.
  4. There is no duplicate or expired data that can be removed from these tables.

In practise it's fairly unlikely that all of these requirements would apply.

The actual worst case can be calculated more accurately.  In a single operation we can merge up to max_threshold sstables together.  We will run concurrent_compactors operations simultaneously.  Thus the most space it would be possible to use simultaneously would be sstable_size x max_threshold x concurrent_compactors.

As an example: with concurrent_compactors set to 2; sstables of 64Mb and default max_threshold (32) then Cassandra will use at most 4 Gb of space for processing.

Due to the above it is hard to provide an accurate estimate of the amount of space required for processing.  In general it's best to benchmark your system with a similar workload and monitor how the free space changes over time.


