Summary
This article discusses an issue where the use of offline tools can cause gossip issues and commit logs to be written with the wrong permissions.
Symptoms
Certain nodes in a cluster have trouble gossiping with other nodes with the following entries reported in the system.log
:
WARN [GossipTasks:1] 2016-01-11 16:59:38,294 Gossiper.java:744 - Gossip stage has 290 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2016-01-11 16:59:39,395 Gossiper.java:744 - Gossip stage has 292 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2016-01-11 16:59:40,495 Gossiper.java:744 - Gossip stage has 296 pending tasks; skipping status check (no nodes will be marked down)
Further investigation identified gossip failures as a result of permission issues on commit logs, i.e. commit logs could not be flushed to disk preventing updates to system
tables:
ERROR [COMMIT-LOG-ALLOCATOR] 2016-01-11 16:20:48,886 CommitLog.java:397 - Failed managing commit log segments. Commit disk failure policy is stop; terminating thread org.apache.cassandra.io.FSWriteError: java.io.FileNotFoundException: /var/lib/cassandra/commitlog/CommitLog-4-1452554366748.log (Permission denied) at org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:177) ~[cassandra-all-2.1.8.689.jar:2.1.8.689] at org.apache.cassandra.db.commitlog.CommitLogSegmentManager$4.call(CommitLogSegmentManager.java:397) ~[cassandra-all-2.1.8.689.jar:2.1.8.689] at org.apache.cassandra.db.commitlog.CommitLogSegmentManager$4.call(CommitLogSegmentManager.java:394) ~[cassandra-all-2.1.8.689.jar:2.1.8.689] at org.apache.cassandra.db.commitlog.CommitLogSegmentManager$1.runMayThrow(CommitLogSegmentManager.java:152) ~[cassandra-all-2.1.8.689.jar:2.1.8.689] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [cassandra-all-2.1.8.689.jar:2.1.8.689] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51] Caused by: java.io.FileNotFoundException: /var/lib/cassandra/commitlog/CommitLog-4-1452554366748.log (Permission denied) at java.io.RandomAccessFile.open0(Native Method) ~[na:1.8.0_51] at java.io.RandomAccessFile.open(RandomAccessFile.java:316) ~[na:1.8.0_51] at java.io.RandomAccessFile.(RandomAccessFile.java:243) ~[na:1.8.0_51] at org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:155) ~[cassandra-all-2.1.8.689.jar:2.1.8.689] ... 5 common frames omitted
Cause
There is a known issue with running tools such as sstable2json
which can inadvertently generate commit logs with incorrect permissions or owned by the root
user as reported in CASSANDRA-8616.
When the tools load the schema, it has the side-effect of updating the schema version on the node which in turn triggers a commit log to be written. If a commit log is generated with root
permissions, it blocks flushing since the DSE/Cassandra process does not have permission to access the affected logs.
Workaround
To bring an affected node back online, follow these steps:
Step 1 - Ensure no DSE/Cassandra process is running, otherwise shut it down.
Step 2 - Change the ownership/permission of all affected logs in the commitlog/
directory. For example:
$ cd /var/lib/data/commitlog $ sudo chown cassandra:cassandra CommitLog*
Step 3 - Start DSE.
See also
Cassandra JIRA - CASSANDRA-8616 sstable tools may result in commit log segments to be written