How to load Spark Cassandra Connector in the shell?

前端未结

关注

 6  1445

长情又很酷 2020-12-07 16:16

I am trying to use Spark Cassandra Connector in Spark 1.1.0.

I have successfully built the jar file from the master branch on GitHub and have gotten the included dem

6条回答

醉话见心 (楼主)

2020-12-07 16:24
The following steps describe how to setup a server with both a Spark Node and a Cassandra Node.

Setting Up Open Source Spark

This assumes you already have Cassandra setup.

Step 1: Download and setup Spark
```
Go to http://spark.apache.org/downloads.html.
```
a) To make things simple, we will use one of the prebuilt Spark packages. Choose Spark version 2.0.0 and Pre-built for Hadoop 2.7 then Direct Download. This will download an archive with the built binaries for Spark.

b) Extract this to a directory of your choosing. I will put mine in ~/apps/spark-1.2

c) Test Spark is working by opening the Shell

Step 2: Test that Spark Works

a) cd into the Spark directory Run "./bin/spark-shell". This will open up the Spark interactive shell program

b) If everything worked it should display this prompt: "scala>"

Run a simple calculation:

sc.parallelize( 1 to 50 ).sum(+) which should output 1250.

c) Congratulations Spark is working! Exit the Spark shell with the command "exit"

The Spark Cassandra Connector

To connect Spark to a Cassandra cluster, the Cassandra Connector will need to be added to the Spark project. DataStax provides their own Cassandra Connector on GitHub and we will use that.
1. Clone the Spark Cassandra Connector repository:
  
  https://github.com/datastax/spark-cassandra-connector
2. cd into "spark-cassandra-connector" Build the Spark Cassandra Connector by executing the command
  
  ./sbt/sbt Dscala-2.11=true assembly
This should output compiled jar files to the directory named "target". There will be two jar files, one for Scala and one for Java. The jar we are interested in is: "spark-cassandra-connector-assembly-1.1.1-SNAPSHOT.jar" the one for Scala. Move the jar file into an easy to find directory: I put mine into ~/apps/spark-1.2/jars

To load the connector into the Spark Shell:

start the shell with this command:

../bin/spark-shell –jars ~/apps/spark-1.2/jars/spark-cassandra-connector-assembly-1.1.1-SNAPSHOT.jar

Connect the Spark Context to the Cassandra cluster and stop the default context:

sc.stop

Import the necessary jar files:
```
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
```
Make a new SparkConf with the Cassandra connection details:

val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")

Create a new Spark Context:

val sc = new SparkContext(conf)

You now have a new SparkContext which is connected to your Cassandra cluster.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...