Summary
Given a sufficiently high log event rate, compression of a log file when using FixedWindowRollingPolicy may cause significant tail-end latency spikes, since the rolled file is compressed synchronously.
For example, if the audit log is enabled with default parameters, this causes all CQL transactions to get audited (including SELECT and modification statements). Since the default audit implementation is the SLF4JAuditWriter, this translates into a high rate of events being logged to a local file. The default FixedWindowRollingPolicy rolls the file when its size exceeds 200MB and then compresses it. On some disks it may take a while to compress a 200MB file, potentially disrupting the throughput.
Another notable example is the debug.log, which can often have a high log event rate. However, since the debug.log file is by default configured with a smaller maxFileSize (20MB), its rollover may be less impactful.
Applies to
- DataStax Enterprise 5.1 and newer
- Apache Cassandra 3.x and newer
Symptoms
Periodic spikes in Max Latency, coinciding with log file rollover.
Cause
In order to handle the Windows scenario, where renaming of an open file is not allowed, the RollingFileAppender first closes the file, handles the rollover tasks (including compression), and only then re-opens the active file for writing. During this rollover step all logging is blocked. Furthermore, when using FixedWindowRollingPolicy, compression of the rolled file is invoked synchronously in logback:1.2.3 and logback:1.1.3 - the two versions of the logback library used in most releases of Cassandra/DSE.
If the configured maxFileSize is sufficiently large, compression may take a long time, causing a noticeable impact.
Workaround
Disable compression, by removing .zip or .gz suffix from specified file pattern in logback.xml:
<fileNamePattern>${cassandra.logdir}/audit/audit.log.%i.zip</fileNamePattern>
Improvement DB-4155 seeks to disable audit log file compression by default, until this behavior in Logback is optimized.
Solution
Delegate handling of log rollover and retention to a third-party utility, such as logrotate.
This means logback.xml would be configured with the simple FileAppender (red - to be removed):
<appender name="SLF4JAuditWriterAppender" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${cassandra.logdir}/audit/audit.log</file>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
<immediateFlush>true</immediateFlush>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/audit/audit.log.%i.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>200MB</maxFileSize>
</triggeringPolicy>
</appender>
An equivalent logrotate configuration file would be required in /etc/logrotate.d/dse:
/var/log/cassandra/audit/audit.log {
size 200m
rotate 5
compress
missingok
notifempty
}