DataStax Help Center

Spark fails with error "SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED"

Summary

Spark applications running on a DataStax Enterprise Analytics node shutdown unexpectedly after a job completes.

Symptoms

When a Spark job completes, errors are reported in the logs. Sample entries from a DSE 4.6 node is as follows:

INFO 2015-04-21 23:19:30 org.apache.spark.deploy.worker.DseSparkWorker: Asked to kill executor app-20150421231756-0001/1
INFO 2015-04-21 23:19:30 org.apache.spark.deploy.worker.ExecutorRunner: Runner thread for executor app-20150421231756-0001/1 interrupted
INFO 2015-04-21 23:19:30 org.apache.spark.deploy.worker.ExecutorRunner: Killing process!
ERROR 2015-04-21 23:19:30 org.apache.spark.util.logging.FileAppender: Error writing stream to file /var/lib/spark/work/app-20150421231756-0001/1/stdout
java.io.IOException: Stream closed
 at java.io.BufferedInputStream.getBufIfOpen(Unknown Source)
 at java.io.BufferedInputStream.read1(Unknown Source)
 at java.io.BufferedInputStream.read(Unknown Source)
 at java.io.FilterInputStream.read(Unknown Source)
 at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
 at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
 at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
INFO 2015-04-21 23:19:30 org.apache.spark.deploy.

If a Spark shell is running at the time, errors are displayed on the console before getting terminated. A sample error looks like:

15/04/21 23:19:30 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED 
15/04/21 23:19:30 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED

Cause

The issue is due to a known defect in open-source Spark (SPARK-4300, SPARK-9844).

When a job completes, it attempts to write to the worker stdout but the IO stream is already closed. This is because the process is destroyed before the write is called in ExecutorRunner.killProcess().

Solution

SPARK-4300 has been fixed in Apache Spark 1.2.2. Note that DSE 4.6 ships with Spark 1.1.0 and DSE 4.7.1 ships with Spark 1.2.2.

The issue resurfaced in Apache Spark 1.3/1.4 and was resolved in Spark 1.6.1 and 2.0 by SPARK-9844.

Note that the issue is benign since the race condition is triggered once the worker is already being shutdown and does not affect the completion of jobs.

See also

Defect - [SPARK-4300] "Race condition during SparkWorker shutdown".

Defect - [SPARK-9844] "File appender race condition during SparkWorker shutdown".

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk