DataStax Help Center

Connecting to DSE Spark Hadoop from a client machine

This article aims to clarify the following document

Running Spark commands against a remote cluster

Normally, the Spark shell is assumed to be executed from the Analytic nodes as follows

$ dse spark [-u username -p password]

However this assumes the user has an actual Unix login to the node which in some cases is not allowed.

In that case, the end user could use a client machine if there is unrestricted network access to/from the DSE Analytic nodes.  If security is enabled, the user still needs to authenticate

First, the DSE nodes need to be configured as Spark+Hadoop nodes (Hadoop mode is needed to use CFS)

The client machine requires the following:

  • Run a Linux based OS
  • Have Java installed - the same version as the nodes, if possible
  • Install the exact same DSE version running on the nodes
  • Configure DSE security just as if it were an actual node

Depending on the type of install on your client machine, rename the Hadoop configuration directory

mv ${DSE_HOME}/resources/hadoop/conf ${DSE_HOME}/resources/hadoop/conf.original
OR
mv /etc/dse/hadoop /etc/dse/hadoop.original

Collect the DSE Hadoop configuration from the nodes and replace to your client machine.

Test dse commands work, for example the Spark+Hadoop node is at 10.0.0.2 and the client machine is 10.0.0.8:

iMac:dse Pepe$ ls -la /tmp/*.txt
-rw-r--r--  1 Pepe  wheel  38 Dec 18 15:25 /tmp/text.txt

iMac:dse Pepe$ bin/dse hadoop fs -put  /tmp/text.txt /

iMac:dse Pepe$ bin/dse hadoop fs -ls /
Found 3 items
-rwxrwxrwx   1 Pepe                staff         38 2015-12-18 18:43 /text.txt
drwxrwxrwx   - josemartinezpoblete staff          0 2015-12-14 11:46 /tmp
drwxrwxrwx   - josemartinezpoblete staff          0 2015-12-14 11:46 /user

iMac:dse Pepe$ bin/dse spark --conf spark.driver.host=10.0.0.8  --master spark://10.0.0.2:7077 [-u username -p password]
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.2
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
Creating SparkContext...
Initializing SparkContext with MASTER: spark://10.0.0.2:7077
Created spark context..
Spark context available as sc.
Hive context available as hc.
CassandraSQLContext available as csc.

scala> 

Rather than the client machine filesystem, the Cassandra Filesystem (CFS) should be used for file processing.  

Operations such as this will fail:

scala> val inputFile = sc.textFile("file:///my/local/machine/file.txt")
scala> rdd1.saveAsTextFile("file:///my/local/file.txt");

First copy your file from your machine to CFS and load using: 

scala> val inputFile = sc.textFile("/my/CFS/file.txt")
scala> rdd1.saveAsTextFile("/my/CFS/file.txt");

Then copy your file back to your local machine using:

$ bin/dse hadoop fs -get  <CFS file>  <Local file>

Alternatively, you could save to your client machine filesystem using Java I/O utilities but not the scala context utility saveAsTextFile

import java.io._
val pw = new PrintWriter(new File("LocalTextFile"))
for (line <- sc.parallelize(1 to 100000).map( num => s"$num::Line").toLocalIterator) { pw.println(line) }
pw.close
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk