nodetool garbagecollect is a tool that cleans unneeded data out of sstables. It works with LeveledCompactionStrategy and SizeTieredCompactionStrategy.
In Cassandra, when data is updated or deleted, records on disk are not immediately touched. Instead, new data, or "tombstones" (deletion markers) are written to new files on disk. When reading data, Cassandra reads through the relevant files on disk (and data in memory) and returns results based on the most recent writes. Over time, compaction can clean up out-of-date or logically deleted data.
nodetool garbagecollect uses single-sstable compactions to do the same: clean up out-of-date records (old versions of records that have been updated) or logically deleted data. For each sstable, it will write a new sstable with unneeded data cleaned out, if possible.
By default, garbagecollect will clean out rows or partitions that have been deleted or updated with newer data. To also clean up deleted or updated cell values, specify the '-g CELL' option. (This will be more i/o and cpu intensive.)
Also by default, garbagecollect currently processes two sstables at a time. This can be adjusted using the '-j' or '--jobs' option. Increasing this will speed up the operation, but require more system resources.
In some cases, garbagecollect can be a useful first step in cleaning out unneeded tombstones. For more details on this, and cleaning tombstones in general, see "cleaning tombstones".