spark-cassandra-connector

Spark Dataframe.cache() behavior for changing source

阅读更多关于 Spark Dataframe.cache() behavior for changing source

问题 My use case: Create a dataframe from a cassandra table. Create a output dataframe by filtering on a column and modify that column's value. Write the output dataframe to cassandra with a TTL set, so all the modified records are deleted after a short period (2s) Return the output dataframe to a caller that writes it to filesystem after some time. I can only return a dataframe to the caller and I don't have further control. Also, i can't increase the TTL. By the time, step 4 is executed, the

org.apache.spark.sql.catalyst.parser.ParseException: in spark scala cassandra api

阅读更多关于 org.apache.spark.sql.catalyst.parser.ParseException: in spark scala cassandra api

问题 i have written the below spark scala code, where in i am trying to implement spark cassandra api. when i try to run it ,i am getting the exception like input mismatch on the date field. and automatically its populating with the data values. i am not able to understand to solve this. please help me on the same. Below is the method which is converting long to Date format: def getTimeInMillis2Date( timeInMillis :Long):Date = { if (timeInMillis == 0l) { return null; } val calendar = Calendar

How to query JSON data column using Spark DataFrames?

阅读更多关于 How to query JSON data column using Spark DataFrames?

问题 I have a Cassandra table that for simplicity looks something like: key: text jsonData: text blobData: blob I can create a basic data frame for this using spark and the spark-cassandra-connector using: val df = sqlContext.read .format("org.apache.spark.sql.cassandra") .options(Map("table" -> "mytable", "keyspace" -> "ks1")) .load() I'm struggling though to expand the JSON data into its underlying structure. I ultimately want to be able to filter based on the attributes within the json string

Could not initialize class com.datastax.spark.connector.types.TypeConverter$ while running job on apache spark 2.0.2 using cassandra connector

阅读更多关于 Could not initialize class com.datastax.spark.connector.types.TypeConverter$ while running job on apache spark 2.0.2 using cassandra connector

问题 I'm trying to run simple count on data set from apache spark shell that was previously fetched to my cassandra cluster. To do this I've created simple maven project that creates fat jar, there are my dependencies:  <dependency> <groupId>com.cloudera.sparkts</groupId> <artifactId>sparkts</artifactId> <version>0.4.1</version> </dependency> <!-- https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector

Why does spark-submit fail with “Failed to load class for data source: org.apache.spark.sql.cassandra” with Cassandra connector in --jars?

阅读更多关于 Why does spark-submit fail with “Failed to load class for data source: org.apache.spark.sql.cassandra” with Cassandra connector in --jars?

问题 Spark version: 1.4.1 Cassandra Version: 2.1.8 Datastax Cassandra Connector: 1.4.2-SNAPSHOT.jar Command I ran ./spark-submit --jars /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --driver-class-path /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --jars /usr/local/lib/spark-1.4.1/external/kafka

Can't write to Cluster if replication__factor is greater than 1

阅读更多关于 Can't write to Cluster if replication__factor is greater than 1

问题 I'm using Spark 1.6.1, Cassandra 2.2.3 and Cassandra-Spark connector 1.6. . I already tried to write to multi node cluster but with replication_factor:1 . Now, I'm trying to write to 6-node cluster with one seed one and keyspace which has replication_factor > 1 but Spark is not responding and he is refusing to do that. As I mention, it works when I'm writing to coordinator with keyspace set to 1 . This is an log which I'm getting and it always stops here or after half an hour he starts to

When does fetch happen from Cassandra

阅读更多关于 When does fetch happen from Cassandra

问题 I have an application that triggers the job to the spark master. But when I check the IP address executing the job, its displaying my application IP and not the spark worker IP. So, from what I understand, the call on RDD generates a spark worker to work. But my question is this. CassandraSQLContext c = new CassandraSQLContext(sc); QueryExecution q=c.executeSql(cqlCommand); //-----1 q.toRDD().count(); //----2 I saw the worker doing something for 2 but nothing for 1. So does this mean fetch

How to change Datatypes of records inserting into Cassandra using Foreach Spark Structure streaming

阅读更多关于 How to change Datatypes of records inserting into Cassandra using Foreach Spark Structure streaming

问题 I am trying to Insert the Deserialized Kafka records to Data Stax Cassandra using Spark Structure Streaming using Foreach Sink. For example, my deserialized Data frame data like all are in string format. id name date 100 'test' sysdate Using foreach Sink I created a class and trying to insert the records as below by converting it. session.execute( s"""insert into ${cassandraDriver.namespace}.${cassandraDriver.brand_dub_sink} (id,name,date) values ('${row.getAs[Long](0)}','${rowstring(1)}','$

Unable to authenticate cassandra cluster through spark scala program

阅读更多关于 Unable to authenticate cassandra cluster through spark scala program

问题 Please suggest me to solve the below issue, or suggest me any different approach to achieve my problem statement. I am getting data from somewhere and inserting it into cassandra daily basis then I need to retrieve the data from cassandra for whole week and do some processing and insert result back onto cassandra. i have lot of records, each record executing most of the below operations. According to my previous post Repreparing preparedstatement warning suggestion, to avoid repreparing the

ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition

阅读更多关于 ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition

问题 I am using Spark version 2.2.1 Using Scala version 2.11.8 OpenJDK 64-Bit Server VM, 1.8.0_131 I have add jar dependency by using code JavaSparkContext sc = new JavaSparkContext(conf); sc.addJar("./target/CassandraSparkJava-1.0-SNAPSHOT-jar-with-dependencies.jar"); Executing below code, but facing ClassNotFoundException:com.datastax.spark.connector.rdd.partitioner.CassandraPartition Dataset<org.apache.spark.sql.Row> dataset = sparksession.read().format("org.apache.spark.sql.cassandra")