Summary
Spark applications running on a DataStax Enterprise Analytics node shutdown unexpectedly after a job completes.
Symptoms
When a Spark job completes, errors are reported in the logs. Sample entries from a DSE 4.6 node is as follows:
INFO 2015-04-21 23:19:30 org.apache.spark.deploy.worker.DseSparkWorker: Asked to kill executor app-20150421231756-0001/1 INFO 2015-04-21 23:19:30 org.apache.spark.deploy.worker.ExecutorRunner: Runner thread for executor app-20150421231756-0001/1 interrupted INFO 2015-04-21 23:19:30 org.apache.spark.deploy.worker.ExecutorRunner: Killing process! ERROR 2015-04-21 23:19:30 org.apache.spark.util.logging.FileAppender: Error writing stream to file /var/lib/spark/work/app-20150421231756-0001/1/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) INFO 2015-04-21 23:19:30 org.apache.spark.deploy.
If a Spark shell is running at the time, errors are displayed on the console before getting terminated. A sample error looks like:
15/04/21 23:19:30 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
15/04/21 23:19:30 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED
Cause
The issue is due to a known defect in open-source Spark (SPARK-4300, SPARK-9844).
When a job completes, it attempts to write to the worker stdout but the IO stream is already closed. This is because the process is destroyed before the write is called in ExecutorRunner.killProcess()
.
Solution
SPARK-4300 has been fixed in Apache Spark 1.2.2. Note that DSE 4.6 ships with Spark 1.1.0 and DSE 4.7.1 ships with Spark 1.2.2.
The issue resurfaced in Apache Spark 1.3/1.4 and was resolved in Spark 1.6.1 and 2.0 by SPARK-9844.
Note that the issue is benign since the race condition is triggered once the worker is already being shutdown and does not affect the completion of jobs.
See also
Defect - [SPARK-4300] "Race condition during SparkWorker shutdown".
Defect - [SPARK-9844] "File appender race condition during SparkWorker shutdown".