DataStax Help Center

Spark SQL throws IllegalArgumentException when hostname is not resolvable

Summary

When using Spark Shell or Spark SQL, a java.lang.IllegalArgumentException is thrown due to hostname resolution failure. This note highlights the fact that the cause of this error is not immediately obvious.

Symptoms

When running a simple query in Spark SQL the following error may be seen in the shell

spark-sql> select * from mykeyspace.mytable limit 10;
WARN 2017-05-02 01:01:05,870 org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 9.0 (TID 57, 10.1.2.3): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: myhostname01
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: myhostname01
at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122)
at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207)
... 20 more

Cause

Spark SQL and Spark shell use the Datastax Java Driver to connect to Apache Cassandra™. The error comes from the Cluster.Builder addContactPoint method. An IllegalArgumentException will be thrown if the hostname does not resolve. The java docs for this exception state

Thrown to indicate that a method has been passed an illegal or inappropriate argument.

Spark will default to using the hostname of the local node for connecting to Apache Cassandra™. However in some cases (i.e. system misconfiguration, multiple network interfaces) this may not resolve to the correct IP.

Workaround

Adding the host entry into the etc/hostsfile on each node will workaround this problem. This is however, not scalable.

Solution

The spark-env.sh file contains various settings for programs such as dse spark and dse spark-sql . The following parameter controls the IP address / hostname on each node, and if set needs to be set on every node in cluster the is running DSE Spark.

# The hostname or IP address Cassandra rpc/native protocol is bound to:
export SPARK_CASSANDRA_CONNECTION_HOST="10.1.2.3"

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk