Full ERROR Message Example
ERROR [MemtableFlushWriter:9] 2020-09-18 06:44:57,339 LogReplica.java:108 - Failed to sync file /var/lib/cassandra/data/cycling/cyclist_name-5aaf0500f97911eab0288dbf1859164c/na_txn_flush_763c4890-f97a-11ea-b028-8dbf1859164c.log org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException: /var/lib/cassandra/data/cycling/cyclist_name-5aaf0500f97911eab0288dbf1859164c/na_txn_flush_763c4890-f97a-11ea-b028-8dbf1859164c.log at org.apache.cassandra.io.util.FileUtils.write(FileUtils.java:745) at org.apache.cassandra.io.util.FileUtils.appendAndSync(FileUtils.java:690) at org.apache.cassandra.db.lifecycle.LogReplica.append(LogReplica.java:104) at org.apache.cassandra.db.lifecycle.LogReplicaSet.lambda$null$5(LogReplicaSet.java:225) at org.apache.cassandra.utils.Throwables.perform(Throwables.java:128) at org.apache.cassandra.utils.Throwables.perform(Throwables.java:118) at org.apache.cassandra.db.lifecycle.LogReplicaSet.append(LogReplicaSet.java:225) at org.apache.cassandra.db.lifecycle.LogFile.addRecord(LogFile.java:363) at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:282) at org.apache.cassandra.utils.Throwables.perform(Throwables.java:128) at org.apache.cassandra.utils.Throwables.perform(Throwables.java:118) at org.apache.cassandra.utils.Throwables.perform(Throwables.java:113) at org.apache.cassandra.db.lifecycle.LogTransaction$TransactionTidier.run(LogTransaction.java:280) at org.apache.cassandra.db.lifecycle.LogTransaction$TransactionTidier.tidy(LogTransaction.java:262) at org.apache.cassandra.utils.concurrent.Ref$GlobalState.release(Ref.java:325) at org.apache.cassandra.utils.concurrent.Ref$State.ensureReleased(Ref.java:203) at org.apache.cassandra.utils.concurrent.Ref.ensureReleased(Ref.java:123) at org.apache.cassandra.db.lifecycle.LogTransaction.complete(LogTransaction.java:399) at org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:416) at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:141) at org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:250) at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:141) at org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1094) at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1040) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
What does this ERROR message mean and why does it occur
The source Code:
void append(LogRecord record) { boolean existed = exists(); try { FileUtils.appendAndSync(file, record.toString()); } catch (FSError e) { logger.error("Failed to sync file {}", file, e); FileUtils.handleFSErrorAndPropagate(e); } // If the file did not exist before appending the first // line, then sync the directory as well since now it must exist if (!existed) syncDirectory(); }
Below are some more details about structures of Cassandra transaction log file:
- Logrecord
A decoded line in a transaction log file replica
- Transaction log file
Transaction log file is used to store transaction records (Logrecord of flushes, compactions, streamings etc), which is then copied into multiple identical replicas on different disks (if multiple disks are used to store Cassandra data)
- LogReplica
A column family may have sstables on different disks and disks can be removed, thus log files are duplicated into many replicas so as to have a file in each directory where sstables exist
As per the above source code from: LogReplica.java and the structure of Cassandra transaction logs, the error message "Failed to sync file <transaction_log_file>" will be triggered if it catches any I/O exception on the file system, when the LogReplica is trying to append and synchronize a log record to transaction log file.
How do you fix this ERROR
There are 3 major causes for the file system exceptions during the log record append and sync:
- I/O error
- Too many open files
- Access denied
The solution depends on the underlying cause indicated by the "Caused by" exception:
- For an "Input/output error", check the syslog and dmesg to see if the Linux kernel is reporting any underlying problems with your disk. You may also wish to use a SMART monitoring utility to check the health of your disk drives.
- For a "Too many open files in system" error, check the ulimits applied to the Cassandra process and make sure it is sufficient for the number of data files you have. Apply the recommended limits from the DSE documentation: https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/config/configRecommendedSettings.html#Setuserresourcelimits.
- For an AccessDeniedException, make sure that the user running the Cassandra process has read and write permission to the data directory and all the files within it. This error commonly occurs if you accidentally started Cassandra as root and then later restart it using a less privileged user. Cassandra will have created files owned by root and can no longer read or write those files as another user.