spark-cassandra-connector

Spark Dataframe.cache() behavior for changing source

自闭症网瘾萝莉.ら 提交于 2019-12-13 18:04:26
问题 My use case: Create a dataframe from a cassandra table. Create a output dataframe by filtering on a column and modify that column's value. Write the output dataframe to cassandra with a TTL set, so all the modified records are deleted after a short period (2s) Return the output dataframe to a caller that writes it to filesystem after some time. I can only return a dataframe to the caller and I don't have further control. Also, i can't increase the TTL. By the time, step 4 is executed, the

org.apache.spark.sql.catalyst.parser.ParseException: in spark scala cassandra api

无人久伴 提交于 2019-12-13 05:43:49
问题 i have written the below spark scala code, where in i am trying to implement spark cassandra api. when i try to run it ,i am getting the exception like input mismatch on the date field. and automatically its populating with the data values. i am not able to understand to solve this. please help me on the same. Below is the method which is converting long to Date format: def getTimeInMillis2Date( timeInMillis :Long):Date = { if (timeInMillis == 0l) { return null; } val calendar = Calendar

How to query JSON data column using Spark DataFrames?

☆樱花仙子☆ 提交于 2019-12-13 05:22:57
问题 I have a Cassandra table that for simplicity looks something like: key: text jsonData: text blobData: blob I can create a basic data frame for this using spark and the spark-cassandra-connector using: val df = sqlContext.read .format("org.apache.spark.sql.cassandra") .options(Map("table" -> "mytable", "keyspace" -> "ks1")) .load() I'm struggling though to expand the JSON data into its underlying structure. I ultimately want to be able to filter based on the attributes within the json string

Could not initialize class com.datastax.spark.connector.types.TypeConverter$ while running job on apache spark 2.0.2 using cassandra connector

夙愿已清 提交于 2019-12-12 06:18:55
问题 I'm trying to run simple count on data set from apache spark shell that was previously fetched to my cassandra cluster. To do this I've created simple maven project that creates fat jar, there are my dependencies: <!-- https://mvnrepository.com/artifact/com.cloudera.sparkts/sparkts --> <dependency> <groupId>com.cloudera.sparkts</groupId> <artifactId>sparkts</artifactId> <version>0.4.1</version> </dependency> <!-- https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector

Why does spark-submit fail with “Failed to load class for data source: org.apache.spark.sql.cassandra” with Cassandra connector in --jars?

点点圈 提交于 2019-12-12 03:07:42
问题 Spark version: 1.4.1 Cassandra Version: 2.1.8 Datastax Cassandra Connector: 1.4.2-SNAPSHOT.jar Command I ran ./spark-submit --jars /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --driver-class-path /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --jars /usr/local/lib/spark-1.4.1/external/kafka

Can't write to Cluster if replication__factor is greater than 1

不羁的心 提交于 2019-12-12 02:25:32
问题 I'm using Spark 1.6.1, Cassandra 2.2.3 and Cassandra-Spark connector 1.6. . I already tried to write to multi node cluster but with replication_factor:1 . Now, I'm trying to write to 6-node cluster with one seed one and keyspace which has replication_factor > 1 but Spark is not responding and he is refusing to do that. As I mention, it works when I'm writing to coordinator with keyspace set to 1 . This is an log which I'm getting and it always stops here or after half an hour he starts to

When does fetch happen from Cassandra

ε祈祈猫儿з 提交于 2019-12-12 00:07:54
问题 I have an application that triggers the job to the spark master. But when I check the IP address executing the job, its displaying my application IP and not the spark worker IP. So, from what I understand, the call on RDD generates a spark worker to work. But my question is this. CassandraSQLContext c = new CassandraSQLContext(sc); QueryExecution q=c.executeSql(cqlCommand); //-----1 q.toRDD().count(); //----2 I saw the worker doing something for 2 but nothing for 1. So does this mean fetch

How to change Datatypes of records inserting into Cassandra using Foreach Spark Structure streaming

痞子三分冷 提交于 2019-12-11 19:45:50
问题 I am trying to Insert the Deserialized Kafka records to Data Stax Cassandra using Spark Structure Streaming using Foreach Sink. For example, my deserialized Data frame data like all are in string format. id name date 100 'test' sysdate Using foreach Sink I created a class and trying to insert the records as below by converting it. session.execute( s"""insert into ${cassandraDriver.namespace}.${cassandraDriver.brand_dub_sink} (id,name,date) values ('${row.getAs[Long](0)}','${rowstring(1)}','$

Unable to authenticate cassandra cluster through spark scala program

自作多情 提交于 2019-12-11 15:55:37
问题 Please suggest me to solve the below issue, or suggest me any different approach to achieve my problem statement. I am getting data from somewhere and inserting it into cassandra daily basis then I need to retrieve the data from cassandra for whole week and do some processing and insert result back onto cassandra. i have lot of records, each record executing most of the below operations. According to my previous post Repreparing preparedstatement warning suggestion, to avoid repreparing the

ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition

只谈情不闲聊 提交于 2019-12-11 15:24:04
问题 I am using Spark version 2.2.1 Using Scala version 2.11.8 OpenJDK 64-Bit Server VM, 1.8.0_131 I have add jar dependency by using code JavaSparkContext sc = new JavaSparkContext(conf); sc.addJar("./target/CassandraSparkJava-1.0-SNAPSHOT-jar-with-dependencies.jar"); Executing below code, but facing ClassNotFoundException:com.datastax.spark.connector.rdd.partitioner.CassandraPartition Dataset<org.apache.spark.sql.Row> dataset = sparksession.read().format("org.apache.spark.sql.cassandra")