DataStax Help Center

Spark Master UI shows no available or incorrect workers

Summary

This article provides the cause and solution to Spark workers being unavailable in the cluster.

Symptoms

Depending on the status of workers, it may not be possible to submit jobs since there are no resources available.

The Spark master user interface (UI) either reports no workers or the incorrect number of workers available. For example:

URL: spark://10.1.2.3:7077
REST URL: spark://10.1.2.3:6066 (cluster mode)
Workers: 0
Cores: 0 Total, 0 Used
Memory: 0.0 B Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

In some instances, attempts to start a Spark shell returns the following error:

ERROR 2016-05-13 06:39:58,209 org.apache.spark.util.Utils: Failed to create dir in /var/lib/spark/rdd. Ignoring this directory.
ERROR 2016-05-13 06:39:58,210 org.apache.spark.storage.DiskBlockManager: Failed to create any local dir.

Cause

A review of the system.log on the nodes show that workers fail during initialisation since the DSE process is unable to access the Spark directories:

ERROR [SPARK-WORKER-INIT-0] 2016-05-13 03:57:08,510  SparkWorkerRunner.java:118 - Failed to configure Spark Worker
java.nio.file.AccessDeniedException: /var/lib/spark/worker/worker.configuration
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[na:1.8.0_91]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_91]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_91]
        at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) ~[na:1.8.0_91]
        at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) ~[na:1.8.0_91]
        at java.nio.file.Files.deleteIfExists(Files.java:1165) ~[na:1.8.0_91]
        at com.datastax.bdp.transport.server.DigestAuthUtils.saveFile(DigestAuthUtils.java:126) ~[dse-core-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.util.Utils$.createConfigurationFile(Utils.scala:111) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.util.Utils.createConfigurationFile(Utils.scala) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.SparkWorkerRunner.args(SparkWorkerRunner.java:114) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.AbstractSparkRunner.initService(AbstractSparkRunner.java:55) [dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.AbstractSparkRunner.initService(AbstractSparkRunner.java:19) [dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.hadoop.mapred.ServiceRunner.run(ServiceRunner.java:126) [dse-hadoop-4.8.6.jar:4.8.6]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
ERROR [SPARK-WORKER-INIT-0] 2016-05-13 03:57:08,511  AbstractSparkRunner.java:126 - SparkWorker-0 threw exception in state STARTING: 
java.lang.RuntimeException: java.nio.file.AccessDeniedException: /var/lib/spark/worker/worker.configuration
        at com.datastax.bdp.spark.SparkWorkerRunner.args(SparkWorkerRunner.java:119) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.AbstractSparkRunner.initService(AbstractSparkRunner.java:55) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.AbstractSparkRunner.initService(AbstractSparkRunner.java:19) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.hadoop.mapred.ServiceRunner.run(ServiceRunner.java:126) ~[dse-hadoop-4.8.6.jar:4.8.6]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: java.nio.file.AccessDeniedException: /var/lib/spark/worker/worker.configuration
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[na:1.8.0_91]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_91]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_91]
        at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) ~[na:1.8.0_91]
        at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) ~[na:1.8.0_91]
        at java.nio.file.Files.deleteIfExists(Files.java:1165) ~[na:1.8.0_91]
        at com.datastax.bdp.transport.server.DigestAuthUtils.saveFile(DigestAuthUtils.java:126) ~[dse-core-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.util.Utils$.createConfigurationFile(Utils.scala:111) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.util.Utils.createConfigurationFile(Utils.scala) ~[dse-spark-4.8.6.jar:4.8.6]
        at com.datastax.bdp.spark.SparkWorkerRunner.args(SparkWorkerRunner.java:114) ~[dse-spark-4.8.6.jar:4.8.6]
        ... 4 common frames omitted

Solution

Check the permissions on the Spark directories, e.g. ensure that the cassandra user has read/write access.

The Spark directories are defined in spark-env.sh:

  • SPARK_WORKER_DIR (default /var/lib/spark/worker)
  • SPARK_LOCAL_DIRS (default /var/lib/spark/rdd)
  • SPARK_WORKER_LOG_DIR (default /var/log/spark/worker)

Reset the ownership and permissions as appropriate. For example:

$ sudo chown -R cassandra:cassandra /var/lib/spark/worker
$ sudo chown -R cassandra:cassandra /var/lib/spark/rdd
$ sudo chown -R cassandra:cassandra /var/log/spark/worker
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk