Overview
This article provides information on configuring multiple contact points (hosts) for Bring-Your-Own-Spark configurations.
Background
Since DataStax Enterprise 5.0, the Bring-Your-Own-Spark (BYOS) feature supports connecting to a DSE cluster from an external Spark cluster.
When connecting to DSE Analytics using an external Hadoop client, one of the configuration items is the spark.hadoop.cassandra.host
property. The value of this property is set to the IP address of one of the DSE nodes running in Analytics mode. For example, a generated BYOS configuration file byos.properties
will contain the following entry:
spark.hadoop.cassandra.host=10.1.2.3
Implementation
Apache Hadoop does not natively support multiple connections for high-availability so this property is not able to handle a string of IP addresses or hostnames. Apache Spark has kept the implementation as-is for backward-compatibility.
Solution
The DSE Analytics team is reviewing the feasibility of implementing this enhancement. The internal feature request IDs are DSP-10873, DSP-13151 and DSP-16183.
Click the "Follow" button above to get notified of updates.