Overview
This article explains the significance of the compression ratio statistic.
Answer
One of the nodetool tablestats
statistics is the SSTable compression ratio. It gives an indication of how well the table's data is compressed based on the ratio of the size of the compressed SSTable data and the original (uncompressed) size.
The SSTable compression ratio is calculated as:
compressionRatio = (double) compressed/uncompressed;
Values range from 0 to 1:
- A low value reflects a high rate of compression.
- A ratio close to 1 indicates the data is hardly compressed.
For tables with a high compression ratio, the cost of compression may outweigh the benefits. Note that what is considered high is a subjective matter and depends on the cluster use case, performance characteristics, and access patterns. DataStax recommends that you perform tests with and without compression enabled to determine the configuration that best suits the application requirements.
Examples
In the output below, the compressed data is 12.9% of the original size which indicates a very good compression ratio:
SSTable Compression Ratio: 0.1287426576442238
In contrast, this output shows 89.6% ratio indicating the compressed data is still huge compared to the original size:
SSTable Compression Ratio: 0.8964684393508305
Further info
- See the nodetool tablestats documentation for DSE 6.7 | 6.0 | 5.1
- The
nodetool cfstats
command was renamed to nodetool tablestats in Apache Cassandra 2.2 (CASSANDRA-8872).