OSS vs DSE source code on the error message
According to the OSS Cassandra source code:
void append(LogRecord record) { Throwable err = Throwables.perform(null, replicas().stream().map(r -> () -> r.append(record))); if (err != null) { if (!record.isFinal() || err.getSuppressed().length == replicas().size() -1) Throwables.maybeFail(err); logger.error("Failed to add record '{}' to some replicas '{}'", record, this); } }
The exact error message "Failed to add record <record_name> to some replicas <list_of_replicas>" is very rare, it will be triggered only when:
- There are record appending errors
- The record being appended is the last record
- The record appending error only happened to some records, not all
In DSE, there will be an extra WARN log entry for each record append in case if there is any error:
void append(LogRecord record) { assertNotClosed(); Throwable err = null; int failed = 0; for (LogReplica replica : replicas()) { try { replica.append(record); } catch (Throwable t) { logger.warn("Failed to add record to a replica: {}", t.getMessage()); err = Throwables.merge(err, t); failed++; } } if (err != null) { if (!record.isFinal() || failed == replicas().size()) Throwables.maybeFail(err); logger.error("Failed to add record '{}' to some replicas '{}'", record, this, err); } }
Full ERROR Message Example
Below are examples of record append failures with WARN log entries and the relevant ERROR log entries. The WARN messages preceding the ERROR messages indicates the actual cause why the log record append failed.
- Access denied:
WARN [MemtableFlushWriter:736] 2020-09-21 05:40:04,177 LogReplicaSet.java:229 - Failed to add record to a replica: java.nio.file.AccessDeniedException: /var/lib/cassandra/data/cycling/cyclist_name-b49e39f0fbcc11eab00a17a789d3fdc4/ba_txn_flush_e629f130-fbcc-11ea-b00a-17a789d3fdc4.log ERROR [MemtableFlushWriter:736] 2020-09-21 05:40:04,177 LogTransaction.java:298 - Transaction log [ba_txn_flush_e629f130-fbcc-11ea-b00a-17a789d3fdc4.log in /var/lib/cassandra/data/cycling/cyclist_name-b49e39f0fbcc11eab00a17a789d3fdc4] indicates txn was not completed, trying to abort it now WARN [MemtableFlushWriter:736] 2020-09-21 05:40:04,179 LogReplicaSet.java:229 - Failed to add record to a replica: java.nio.file.AccessDeniedException: /var/lib/cassandra/data/cycling/cyclist_name-b49e39f0fbcc11eab00a17a789d3fdc4/ba_txn_flush_e629f130-fbcc-11ea-b00a-17a789d3fdc4.log ERROR [MemtableFlushWriter:736] 2020-09-21 05:40:04,179 LogTransaction.java:302 - Failed to abort transaction log [ba_txn_flush_e629f130-fbcc-11ea-b00a-17a789d3fdc4.log in /var/lib/cassandra/data/cycling/cyclist_name-b49e39f0fbcc11eab00a17a789d3fdc4] org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException: /var/lib/cassandra/data/cycling/cyclist_name-b49e39f0fbcc11eab00a17a789d3fdc4/ba_txn_flush_e629f130-fbcc-11ea-b00a-17a789d3fdc4.log
- Too many open files in system:
WARN [MemtableFlushWriter:7] 2019-05-21 14:02:05,731 LogReplicaSet.java:220 - Failed to add record to a replica: java.nio.filency_histograms-92cda67846ce366f9af5af9ec805c0b6/aa_txn_flush_03ef0c50-7bd1-11e9-b2cf-b91f6efefffd.log: Too many open files in sy ERROR [MemtableFlushWriter:7] 2019-05-21 14:02:05,732 LogTransaction.java:276 - Failed to abort transaction log [aa_txn_flush_0a/dse_perf/range_latency_histograms-92cda67846ce366f9af5af9ec805c0b6] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: /app/cassandra/dse-6.0.4/dse-data/data/dse_perf/range_le9-b2cf-b91f6efefffd.log: Too many open files in system at org.apache.cassandra.io.util.FileUtils.write(FileUtils.java:592)
- I/O error:
WARN [MemtableFlushWriter:6] 2020-03-05 10:06:43,863 LogReplicaSet.java:197 - Failed to add record to a replica: java.io.IOException: Input/output error ERROR [MemtableFlushWriter:6] 2020-03-05 10:06:43,863 LogTransaction.java:261 - [/database/Dataspace/cassandra/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/mc_txn_flush_04198440-5ec9-11ea-9522-c1de20c1ea6b.log] was not completed, trying to abort it now WARN [MemtableFlushWriter:6] 2020-03-05 10:06:43,870 LogReplicaSet.java:197 - Failed to add record to a replica: java.io.IOException: Input/output error ERROR [MemtableFlushWriter:6] 2020-03-05 10:06:43,874 LogTransaction.java:264 - Failed to abort [/database/Dataspace/cassandra/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/mc_txn_flush_04198440-5ec9-11ea-9522-c1de20c1ea6b.log] java.lang.RuntimeException: java.io.IOException: Input/output error
What does this ERROR message mean and why does it occur
- Transaction log file:
Transaction log file is used to store transaction records (flushes, compactions, streamings etc), which is then copied into multiple identical replicas on different disks (if multiple disks are used to store Cassandra data).
- Log Record:
A decoded line in a transaction log file replica.
- Log Replica:
A column family may have sstables on different disks and disks can be removed, thus log files are duplicated into many replicas so as to have a file in each directory where sstables exist.
- Log Replicaset:
A set of log replicas. It mostly iterates over replicas when writing or reading, ensuring consistency among them and hiding replication details from LogFile.
In log replicaset, a function append(LogRecord record) as showing in the above source code, is used to add log record to all the replicas, if there is any error adding a record, the WARN or ERROR messages above will be triggered, due to 3 major reasons as showing in above log examples:
- Access denied
- I/O error
- Too many open files
How do you fix this ERROR
The solution depends on the underlying cause indicated by the "Caused by" exception:
- For an "Input/output error", check the syslog and dmesg to see if the Linux kernel is reporting any underlying problems with your disk. You may also wish to use a SMART monitoring utility to check the health of your disk drives.
- For a "Too many open files in system" error, check the ulimits applied to the Cassandra process and make sure it is sufficient for the number of data files you have. Apply the recommended limits from the DSE documentation: https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/config/configRecommendedSettings.html#Setuserresourcelimits.
- For an AccessDeniedException, make sure that the user running the Cassandra process has read and write permission to the data directory and all the files within it. This error commonly occurs if you accidentally started Cassandra as root and then later restart it using a less privileged user. Cassandra will have created files owned by root and can no longer read or write those files as another user.