spark-cassandra-connector

Unable to generate UUIDs in Spark SQL

阅读更多关于 Unable to generate UUIDs in Spark SQL

问题 below is the code block and the error recieved > creating a temporary views sqlcontext.sql("""CREATE TEMPORARY VIEW temp_pay_txn_stage USING org.apache.spark.sql.cassandra OPTIONS ( table "t_pay_txn_stage", keyspace "ks_pay", cluster "Test Cluster", pushdown "true" )""".stripMargin) sqlcontext.sql("""CREATE TEMPORARY VIEW temp_pay_txn_source USING org.apache.spark.sql.cassandra OPTIONS ( table "t_pay_txn_source", keyspace "ks_pay", cluster "Test Cluster", pushdown "true" )""".stripMargin)

Pass columnNames dynamically to cassandraTable().select()

阅读更多关于 Pass columnNames dynamically to cassandraTable().select()

问题 I'm reading query off of a file at run-time and executing it on the SPark+Cassandra environment. I'm executing : sparkContext.cassandraTable.("keyspaceName", "colFamilyName").select("col1", "col2", "col3").where("some condition = true") Query in FIle : select col1, col2, col3 from keyspaceName.colFamilyName where somecondition = true Here Col1,col2,col3 can vary depending on the query parsed from the file. Question : How do I pick columnName from query and pass them to select() and runtime. I

Spark-Cassandra Connector : Failed to open native connection to Cassandra

阅读更多关于 Spark-Cassandra Connector : Failed to open native connection to Cassandra

问题 I am new to Spark and Cassandra. On trying to submit a spark job, I am getting an error while connecting to Cassandra. Details: Versions: Spark : 1.3.1 (build for hadoop 2.6 or later : spark-1.3.1-bin-hadoop2.6) Cassandra : 2.0 Spark-Cassandra-Connector: 1.3.0-M1 scala : 2.10.5 Spark and Cassandra is on a virtual cluster Cluster details: Spark Master : 192.168.101.13 Spark Slaves : 192.168.101.11 and 192.168.101.12 Cassandra Nodes: 192.168.101.11 (seed node) and 192.168.101.12 I am trying to

Guava version while using spark-shell

阅读更多关于 Guava version while using spark-shell

问题 I'm trying to use the spark-cassandra-connector via spark-shell on dataproc, however I am unable to connect to my cluster. It appears that there is a version mismatch since the classpath is including a much older guava version from somewhere else, even when I specify the proper version on startup. I suspect this is likely caused by all the Hadoop dependencies put into the classpath by default. Is there anyway to have spark-shell use only the proper version of guava, without getting rid of all

Why Apache Spark is performing the filters on client

阅读更多关于 Why Apache Spark is performing the filters on client

问题 Being newbie on apache spark, facing some issue on fetching Cassandra data on Spark. List<String> dates = Arrays.asList("2015-01-21","2015-01-22"); CassandraJavaRDD<A> aRDD = CassandraJavaUtil.javaFunctions(sc). cassandraTable("testing", "cf_text",CassandraJavaUtil.mapRowTo(A.class, colMap)). where("Id=? and date IN ?","Open",dates); This query is not filtering data on the cassandra server. While this java statement is executing its shooting up the memory and finally throwing spark java.lang

What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

阅读更多关于 What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

问题 cassandra-connector-assembly-2.0.0 built from github project. with Scala 2.11.8 , cassandra-driver-core-3.1.0 sc.cassandraTable("mykeyspace", "mytable").select("something").where("key=?", key).mapPartitions(par => { par.map({ row => (row.getString("something"), 1 ) }) }) .reduceByKey(_ + _).collect().foreach(println) The same job works fine for reading less mass data java.lang.NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults()Lshade/com/datastax/spark/connector/google

What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

阅读更多关于 What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

Spark 1.5.1, Cassandra Connector 1.5.0-M2, Cassandra 2.1, Scala 2.10, NoSuchMethodError guava dependency

阅读更多关于 Spark 1.5.1, Cassandra Connector 1.5.0-M2, Cassandra 2.1, Scala 2.10, NoSuchMethodError guava dependency

问题 New to the Spark environment (and fairly new to Maven) so I'm struggling with how to send the dependencies I need correctly. It looks like Spark 1.5.1 has a guava-14.0.1 dependency which it tries to use and the isPrimitive was added in 15+. What's the correct way to ensure my uber-jar wins? I've tried spark.executor.extraClassPath in my spark-defaults.conf to no avail. Duplicate to this [question]:Spark 1.5.1 + Scala 2.10 + Kafka + Cassandra = Java.lang.NoSuchMethodError: but for Maven

Apache Spark taking 5 to 6 minutes for simple count of 1 billon rows from Cassandra

阅读更多关于 Apache Spark taking 5 to 6 minutes for simple count of 1 billon rows from Cassandra

问题 I am using the Spark Cassandra connector. It take 5-6 minutes for fetch data from Cassandra table. In Spark I have seen many tasks and Executor in log. The reason might be that Spark divided the process in many tasks! Below is my code example : public static void main(String[] args) { SparkConf conf = new SparkConf(true).setMaster("local[4]") .setAppName("App_Name") .set("spark.cassandra.connection.host", "127.0.0.1"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<Demo_Bean>

How to resolve the conflict between 11.0.2 and 16.0 of guava when using yarn, spark and spark-cassandra-connector?

阅读更多关于 How to resolve the conflict between 11.0.2 and 16.0 of guava when using yarn, spark and spark-cassandra-connector?

问题 my yarn's version is hadop-2.4.0.x , spark is spark-1.5.1-bin-hadoop2.4 and spark-cassandra-connector is spark-cassandra-connector_2.10-1.5.0-M2 , when I executed the following command: bin/spark-shell --driver-class-path $(echo lib/*.jar | sed 's/ /:/g') --master yarn-client --deploy-mode client --conf spark.cassandra.connection.host=192.21.0.209 --conf spark.cassandra.auth.username=username --conf spark.cassandra.auth.password=password --conf spark.sql.dialect=sql --jars lib/guava-16.0.jar