Summary
DSE has a “FailureDetector” mechanism that enables a node to perform regular checks on itself to detect if it has paused too long. This mechanism helps to prevent the node from creating false positives and marking other nodes down in gossip. If you see these messages at any time other than node startup, then this message could be a sign of high load or the underlying OS being paused as can occur on a virtual machine.
Applies to
- All versions of DSE
Symptoms
The following error may be observed in the system.log
:
WARN [GossipTasks:1] 2019-09-06 12:02:16,779 FailureDetector.java:288 - Not marking nodes down due to local pause of 7994062116 > 5000000000
Cause
The failure detector checks the time every second as part of the GossipTasks thread. If a time discrepancy of more than 5 seconds occurs since the last check and the current system time, the local node does not mark other nodes down in gossip to avoid false positives.
You will always see one of these messages during a node startup as the JVM allocates resources to DSE. If there are subsequent messages then they are usually accompanied by other signs of load. For example, typically large GC pauses:
INFO [Service Thread] 2019-09-19 15:59:36,756 GCInspector.java:245 - G1 Young Generation GC in 270ms. G1 Eden Space: 90177536 -> 0; G1 Old Gen: 1931083912 -> 1946812552; INFO [Service Thread] 2019-09-19 15:59:38,985 GCInspector.java:245 - G1 Young Generation GC in 417ms. G1 Eden Space: 90177536 -> 0; G1 Old Gen: 1962541208 -> 2075263120; G1 Survivor Space: 13631488 -> 8388608; WARN [GossipTasks:1] 2019-09-19 15:59:47,198 FailureDetector.java:288 - Not marking nodes down due to local pause of 8211495979 > 5000000000 INFO [Service Thread] 2019-09-19 15:59:47,199 GCInspector.java:245 - G1 Young Generation GC in 281ms. G1 Eden Space: 1048576 -> 0; G1 Old Gen: 2075263120 -> 2084700304; G1 Survivor Space: 8388608 -> 0; WARN [Service Thread] 2019-09-19 15:59:47,201 GCInspector.java:243 - G1 Old Generation GC in 7920ms. G1 Old Gen: 2084700304 -> 1773677016;
However if these messages are observed without signs of overload, then the DSE node is typically being starved of resources. For example, a VM host swapping out CPU cycles to other VM instances using the same hardware.
Solution
If you observe these messages on virtualized instances, ensure that DSE instances are not competing with other applications for hardware resources.