spark-cassandra-connector

how to use Cassandra Context in spark 2.0

阅读更多关于 how to use Cassandra Context in spark 2.0

问题 In previous Version of Spark like 1.6.1, i am using creating Cassandra Context using spark Context, import org.apache.spark.{ Logging, SparkContext, SparkConf } //config val conf: org.apache.spark.SparkConf = new SparkConf(true) .set("spark.cassandra.connection.host", CassandraHost) .setAppName(getClass.getSimpleName) lazy val sc = new SparkContext(conf) val cassandraSqlCtx: org.apache.spark.sql.cassandra.CassandraSQLContext = new CassandraSQLContext(sc) //Query using Cassandra context

How to resolve Guava dependency issue while submitting Uber Jar to Google Dataproc

阅读更多关于 How to resolve Guava dependency issue while submitting Uber Jar to Google Dataproc

I am using maven shade plugin to build Uber jar for submitting it as a job to google dataproc cluster. Google have installed Apache Spark 2.0.2 Apache Hadoop 2.7.3 on their cluster. Apache spark 2.0.2 uses 14.0.1 of com.google.guava and apache hadoop 2.7.3 uses 11.0.2, these both should be in the classpath already. <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <!-- <artifactSet> <includes> <include>com.google.guava:guava

Unable to generate UUIDs in Spark SQL

阅读更多关于 Unable to generate UUIDs in Spark SQL

below is the code block and the error recieved > creating a temporary views sqlcontext.sql("""CREATE TEMPORARY VIEW temp_pay_txn_stage USING org.apache.spark.sql.cassandra OPTIONS ( table "t_pay_txn_stage", keyspace "ks_pay", cluster "Test Cluster", pushdown "true" )""".stripMargin) sqlcontext.sql("""CREATE TEMPORARY VIEW temp_pay_txn_source USING org.apache.spark.sql.cassandra OPTIONS ( table "t_pay_txn_source", keyspace "ks_pay", cluster "Test Cluster", pushdown "true" )""".stripMargin) querying the views as below to be able to get new records from stage not present in source . Scala> val df

how to use Cassandra Context in spark 2.0

阅读更多关于 how to use Cassandra Context in spark 2.0

In previous Version of Spark like 1.6.1, i am using creating Cassandra Context using spark Context, import org.apache.spark.{ Logging, SparkContext, SparkConf } //config val conf: org.apache.spark.SparkConf = new SparkConf(true) .set("spark.cassandra.connection.host", CassandraHost) .setAppName(getClass.getSimpleName) lazy val sc = new SparkContext(conf) val cassandraSqlCtx: org.apache.spark.sql.cassandra.CassandraSQLContext = new CassandraSQLContext(sc) //Query using Cassandra context cassandraSqlCtx.sql("select id from table ") But In Spark 2.0 , Spark Context is replaced with Spark session,

Pass columnNames dynamically to cassandraTable().select()

阅读更多关于 Pass columnNames dynamically to cassandraTable().select()

I'm reading query off of a file at run-time and executing it on the SPark+Cassandra environment. I'm executing : sparkContext.cassandraTable.("keyspaceName", "colFamilyName").select("col1", "col2", "col3").where("some condition = true") Query in FIle : select col1, col2, col3 from keyspaceName.colFamilyName where somecondition = true Here Col1,col2,col3 can vary depending on the query parsed from the file. Question : How do I pick columnName from query and pass them to select() and runtime. I have tried many ways to do it : 1. dumbest thing done (which obviously threw an error) - var str =

Spark-Cassandra Connector : Failed to open native connection to Cassandra

阅读更多关于 Spark-Cassandra Connector : Failed to open native connection to Cassandra

I am new to Spark and Cassandra. On trying to submit a spark job, I am getting an error while connecting to Cassandra. Details: Versions: Spark : 1.3.1 (build for hadoop 2.6 or later : spark-1.3.1-bin-hadoop2.6) Cassandra : 2.0 Spark-Cassandra-Connector: 1.3.0-M1 scala : 2.10.5 Spark and Cassandra is on a virtual cluster Cluster details: Spark Master : 192.168.101.13 Spark Slaves : 192.168.101.11 and 192.168.101.12 Cassandra Nodes: 192.168.101.11 (seed node) and 192.168.101.12 I am trying to submit a job through my client machine (laptop) - 172.16.0.6. After googling for this error, I have

Datastax Cassandra Driver throwing CodecNotFoundException

阅读更多关于 Datastax Cassandra Driver throwing CodecNotFoundException

The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5 Datastax-cassandra 3.2.1 CDH 5.5.1 The code I am trying to execute is a Spark program using the java api and it basically reads data (csv's) from hdfs and loads it into cassandra tables . I am using the spark-cassandra-connector. I had a lot of issues regarding the google s guava library conflict initially which I was able to resolve by shading the guava library and

Guava version while using spark-shell

阅读更多关于 Guava version while using spark-shell

I'm trying to use the spark-cassandra-connector via spark-shell on dataproc, however I am unable to connect to my cluster. It appears that there is a version mismatch since the classpath is including a much older guava version from somewhere else, even when I specify the proper version on startup. I suspect this is likely caused by all the Hadoop dependencies put into the classpath by default. Is there anyway to have spark-shell use only the proper version of guava, without getting rid of all the Hadoop-related dataproc included jars? Relevant Data: Starting spark-shell, showing it having the

Datastax Cassandra Driver throwing CodecNotFoundException

阅读更多关于 Datastax Cassandra Driver throwing CodecNotFoundException

问题 The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5 Datastax-cassandra 3.2.1 CDH 5.5.1 The code I am trying to execute is a Spark program using the java api and it basically reads data (csv's) from hdfs and loads it into cassandra tables . I am using the spark-cassandra-connector. I had a lot of issues regarding the

What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

阅读更多关于 What happens - NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults

cassandra-connector-assembly-2.0.0 built from github project. with Scala 2.11.8 , cassandra-driver-core-3.1.0 sc.cassandraTable("mykeyspace", "mytable").select("something").where("key=?", key).mapPartitions(par => { par.map({ row => (row.getString("something"), 1 ) }) }) .reduceByKey(_ + _).collect().foreach(println) The same job works fine for reading less mass data java.lang.NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults()Lshade/com/datastax/spark/connector/google/common/util/concurrent/ListenableFuture; at com.datastax.spark.connector.rdd.reader