This article explains why DSE Spark jobs fail during initialisation with
When attempting to run a Spark job in DataStax Enterprise, e.g. with
dse spark-submit, the job fails during initialisation. Below is a sample stack trace from DSE 4.8.3:
java.lang.ExceptionInInitializerError: null ... Caused by: java.lang.NoSuchFieldException: SHUTDOWN_HOOK_PRIORITY at java.lang.Class.getField(Class.java:1695) at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:222) at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50) at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala:58) at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala) ...
The issue stems from an incompatible class in the application's classpath. During execution, Spark locates a match for the class
org.apache.hadoop.util.ShutdownHookManager in the classpath.
SHUTDOWN_HOOK_PRIORITY for the
hadoop.fs.FileSystem class was introduced in Hadoop 2+. The field does not exist since DataStax Enterprise 4.8.3 ships with Hadoop v1.0.4 resulting in the failure.
Review the application's dependencies for existence of any Hadoop 2.x class, for example:
+--- org.apache.hadoop:hadoop-client:2.2.0 | +--- org.apache.hadoop:hadoop-common:2.2.0 | | +--- org.apache.hadoop:hadoop-annotations:2.2.0
Remove the offending class (JAR) from the classpath to force the application to run with the embedded Hadoop in DataStax Enterprise.
DSE doc - DataStax Enterprise 4.8 release notes
Spark on GitHub - org.apache.spark.util.ShutdownHookManager.install() method
Hadoop API doc - org.apache.hadoop.fs.Filesystem SHUTDOWN_HOOK_PRIORITY field