Summary
This article discusses the importance of NTP for spark streaming and why NTP needs to be enabled and working across the cluster
Applies to
DSE4.8.x, 5.0.x, 5.1.x
Symptoms
A long running streaming application was observed to restart at various times that didn’t seem to occur at any particular time or under any particular circumstances. The application had already utilised the spark connector keepalive connection.keep_alive_ms
setting, too.
Cause
The application was using Spark 1.6.3 (included with DSE5.0.x).
In the example below, the application logs indicate that the streaming block cleanup might be happening before it’s possible (timewise). The driver was running on the same DSE cluster. The following error give a hint on where this occurs in the scala code:
Caused by: java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.cleanupOldBatches(ReceivedBlockTracker.scala:166) <<-- here ...
The scala code itself shows that there is a timestamp check
require(cleanupThreshTime.milliseconds < clock.getTimeMillis())
Solution
Enabling NTP ensures all cluster nodes are synced to the same time. As outlined above, time syncing will mitigate the possibility of unexpected actions that can occur when cluster nodes and client application nodes have time differences.