I am trying to use Spark Cassandra Connector in Spark 1.1.0.
I have successfully built the jar file from the master branch on GitHub and have gotten the included dem
The following steps describe how to setup a server with both a Spark Node and a Cassandra Node.
Setting Up Open Source Spark
This assumes you already have Cassandra setup.
Step 1: Download and setup Spark
Go to http://spark.apache.org/downloads.html.
a) To make things simple, we will use one of the prebuilt Spark packages. Choose Spark version 2.0.0 and Pre-built for Hadoop 2.7 then Direct Download. This will download an archive with the built binaries for Spark.
b) Extract this to a directory of your choosing. I will put mine in ~/apps/spark-1.2
c) Test Spark is working by opening the Shell
Step 2: Test that Spark Works
a) cd into the Spark directory Run "./bin/spark-shell". This will open up the Spark interactive shell program
b) If everything worked it should display this prompt: "scala>"
Run a simple calculation:
sc.parallelize( 1 to 50 ).sum(+) which should output 1250.
c) Congratulations Spark is working! Exit the Spark shell with the command "exit"
The Spark Cassandra Connector
To connect Spark to a Cassandra cluster, the Cassandra Connector will need to be added to the Spark project. DataStax provides their own Cassandra Connector on GitHub and we will use that.
Clone the Spark Cassandra Connector repository:
https://github.com/datastax/spark-cassandra-connector
cd into "spark-cassandra-connector" Build the Spark Cassandra Connector by executing the command
./sbt/sbt Dscala-2.11=true assembly
This should output compiled jar files to the directory named "target". There will be two jar files, one for Scala and one for Java. The jar we are interested in is: "spark-cassandra-connector-assembly-1.1.1-SNAPSHOT.jar" the one for Scala. Move the jar file into an easy to find directory: I put mine into ~/apps/spark-1.2/jars
To load the connector into the Spark Shell:
start the shell with this command:
../bin/spark-shell –jars ~/apps/spark-1.2/jars/spark-cassandra-connector-assembly-1.1.1-SNAPSHOT.jar
Connect the Spark Context to the Cassandra cluster and stop the default context:
sc.stop
Import the necessary jar files:
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
Make a new SparkConf with the Cassandra connection details:
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
Create a new Spark Context:
val sc = new SparkContext(conf)
You now have a new SparkContext which is connected to your Cassandra cluster.