DataStax Help Center

Nodes showing DN in nodetool status with "invalid gossip generation" warning in logs

Summary

This article discusses an issue where nodes are unable to gossip with other nodes which have been running for a long time.

Symptoms

A cluster with long-running DSE processes reports nodes as down (e.g. DN in nodetool status output) after DSE has been restarted. The following sample warning is reported in the system.log:

WARN [GossipStage:1] 2017-03-01 13:28:19,589 Gossiper.java:1105 - received an invalid gossip generation for peer /10.1.2.3; local generation = 1455182503, received generation = 1488365632

Cause

In CASSANDRA-8113, gossip with generation numbers set too far into the future (i.e. corrupted gossip from a node) is ignored to prevent the corruption from bringing down the rest of the cluster. This enhancement was implemented in Apache Cassandra 2.1.1 onwards.

This behaviour inadvertently prevented long-running nodes from gossiping with nodes which have just been restarted or joined the cluster. More explicitly when the difference between a node's gossip and the generation received from another node exceeds the one-year threshold as identified in CASSANDRA-10969.

Workaround

Perform a rolling restart of all nodes in all data centres to force the nodes' gossip generation to reset to a lower value.

NOTE - In some situations, newly restarted nodes would have already gossiped with nodes which still have really old generations and end up "contaminating" the gossip pool before they are restarted. It will be necessary to perform several rolling restarts until old generations are purged from gossip.

Solution

CASSANDRA-10969 was fixed in Apache Cassandra 2.1.13 and is shipped in DSE 4.7.8+ and DSE 4.8.5+. Upgrade to the latest DSE 4.8 release to resolve the issue permanently.

See also

Cassandra JIRA - CASSANDRA-8113 Gossip should ignore generation numbers too far in the future

Cassandra JIRA - CASSANDRA-10969 Fix bad gossip generation seen in long-running clusters

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk