spark-cassandra-connector

Unable to generate UUIDs in Spark SQL

随声附和 提交于 2019-12-20 05:18:27
问题 below is the code block and the error recieved > creating a temporary views sqlcontext.sql("""CREATE TEMPORARY VIEW temp_pay_txn_stage USING org.apache.spark.sql.cassandra OPTIONS ( table "t_pay_txn_stage", keyspace "ks_pay", cluster "Test Cluster", pushdown "true" )""".stripMargin) sqlcontext.sql("""CREATE TEMPORARY VIEW temp_pay_txn_source USING org.apache.spark.sql.cassandra OPTIONS ( table "t_pay_txn_source", keyspace "ks_pay", cluster "Test Cluster", pushdown "true" )""".stripMargin)

Pass columnNames dynamically to cassandraTable().select()

大兔子大兔子 提交于 2019-12-20 02:55:04
问题 I'm reading query off of a file at run-time and executing it on the SPark+Cassandra environment. I'm executing : sparkContext.cassandraTable.("keyspaceName", "colFamilyName").select("col1", "col2", "col3").where("some condition = true") Query in FIle : select col1, col2, col3 from keyspaceName.colFamilyName where somecondition = true Here Col1,col2,col3 can vary depending on the query parsed from the file. Question : How do I pick columnName from query and pass them to select() and runtime. I

Spark-Cassandra Connector : Failed to open native connection to Cassandra

徘徊边缘 提交于 2019-12-19 22:04:10
问题 I am new to Spark and Cassandra. On trying to submit a spark job, I am getting an error while connecting to Cassandra. Details: Versions: Spark : 1.3.1 (build for hadoop 2.6 or later : spark-1.3.1-bin-hadoop2.6) Cassandra : 2.0 Spark-Cassandra-Connector: 1.3.0-M1 scala : 2.10.5 Spark and Cassandra is on a virtual cluster Cluster details: Spark Master : 192.168.101.13 Spark Slaves : 192.168.101.11 and 192.168.101.12 Cassandra Nodes: 192.168.101.11 (seed node) and 192.168.101.12 I am trying to

Guava version while using spark-shell

戏子无情 提交于 2019-12-19 05:34:27
问题 I'm trying to use the spark-cassandra-connector via spark-shell on dataproc, however I am unable to connect to my cluster. It appears that there is a version mismatch since the classpath is including a much older guava version from somewhere else, even when I specify the proper version on startup. I suspect this is likely caused by all the Hadoop dependencies put into the classpath by default. Is there anyway to have spark-shell use only the proper version of guava, without getting rid of all

Why Apache Spark is performing the filters on client

两盒软妹~` 提交于 2019-12-19 04:39:46
问题 Being newbie on apache spark, facing some issue on fetching Cassandra data on Spark. List<String> dates = Arrays.asList("2015-01-21","2015-01-22"); CassandraJavaRDD<A> aRDD = CassandraJavaUtil.javaFunctions(sc). cassandraTable("testing", "cf_text",CassandraJavaUtil.mapRowTo(A.class, colMap)). where("Id=? and date IN ?","Open",dates); This query is not filtering data on the cassandra server. While this java statement is executing its shooting up the memory and finally throwing spark java.lang

What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

浪尽此生 提交于 2019-12-18 13:43:05
问题 cassandra-connector-assembly-2.0.0 built from github project. with Scala 2.11.8 , cassandra-driver-core-3.1.0 sc.cassandraTable("mykeyspace", "mytable").select("something").where("key=?", key).mapPartitions(par => { par.map({ row => (row.getString("something"), 1 ) }) }) .reduceByKey(_ + _).collect().foreach(println) The same job works fine for reading less mass data java.lang.NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults()Lshade/com/datastax/spark/connector/google

What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

北战南征 提交于 2019-12-18 13:41:31
问题 cassandra-connector-assembly-2.0.0 built from github project. with Scala 2.11.8 , cassandra-driver-core-3.1.0 sc.cassandraTable("mykeyspace", "mytable").select("something").where("key=?", key).mapPartitions(par => { par.map({ row => (row.getString("something"), 1 ) }) }) .reduceByKey(_ + _).collect().foreach(println) The same job works fine for reading less mass data java.lang.NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults()Lshade/com/datastax/spark/connector/google

Spark 1.5.1, Cassandra Connector 1.5.0-M2, Cassandra 2.1, Scala 2.10, NoSuchMethodError guava dependency

非 Y 不嫁゛ 提交于 2019-12-18 09:38:53
问题 New to the Spark environment (and fairly new to Maven) so I'm struggling with how to send the dependencies I need correctly. It looks like Spark 1.5.1 has a guava-14.0.1 dependency which it tries to use and the isPrimitive was added in 15+. What's the correct way to ensure my uber-jar wins? I've tried spark.executor.extraClassPath in my spark-defaults.conf to no avail. Duplicate to this [question]:Spark 1.5.1 + Scala 2.10 + Kafka + Cassandra = Java.lang.NoSuchMethodError: but for Maven

Apache Spark taking 5 to 6 minutes for simple count of 1 billon rows from Cassandra

泪湿孤枕 提交于 2019-12-17 19:04:28
问题 I am using the Spark Cassandra connector. It take 5-6 minutes for fetch data from Cassandra table. In Spark I have seen many tasks and Executor in log. The reason might be that Spark divided the process in many tasks! Below is my code example : public static void main(String[] args) { SparkConf conf = new SparkConf(true).setMaster("local[4]") .setAppName("App_Name") .set("spark.cassandra.connection.host", "127.0.0.1"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<Demo_Bean>

How to resolve the conflict between 11.0.2 and 16.0 of guava when using yarn, spark and spark-cassandra-connector?

試著忘記壹切 提交于 2019-12-14 03:56:01
问题 my yarn's version is hadop-2.4.0.x , spark is spark-1.5.1-bin-hadoop2.4 and spark-cassandra-connector is spark-cassandra-connector_2.10-1.5.0-M2 , when I executed the following command: bin/spark-shell --driver-class-path $(echo lib/*.jar | sed 's/ /:/g') --master yarn-client --deploy-mode client --conf spark.cassandra.connection.host=192.21.0.209 --conf spark.cassandra.auth.username=username --conf spark.cassandra.auth.password=password --conf spark.sql.dialect=sql --jars lib/guava-16.0.jar