Please find the Cassandra table below.
I am trying to copy data from 1 Cassandra table to another Cassandra table with same structure.
Please help me.
CREATE TABLE data2 ( d_no text, d_type text, sn_perc int, tse_dt timestamp, f_lvl text, ign_f boolean, lk_loc text, lk_ts timestamp, mi_rem text, nr_fst text, perm_stat text, rec_crt_dt timestamp, sr_stat text, sor_query text, tp_dat text, tp_ts timestamp, tr_rem text, tr_type text, PRIMARY KEY (device_serial_no, device_type) ) WITH CLUSTERING ORDER BY (device_type ASC) Data inserted using:
Insert into data2(all column names) values('64FCFCFC','HUM',4,'1970-01-02 05:30:00’ ,’NA’,true,'NA','1970-01-02 05:40:00',’NA’,'NA','NA','1970-02-01 05:30:00','NA','NA','NA','1970-02-03 05:30:00','NA','NA'); Note: The 4th column timestamp when i try to insert like this '1970-01-02 05:30:00’ ,and in dtaframe also timestamp inserted correctly ,but when insert from dataframe to cassandra and use select * from table, i see its being inserted like 1970-01-02 00:00:00.000000+0000
similarly for all time stamp columns its happening .
pom.xml
<dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.3.0</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.3.1</version> </dependency> <!-- https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector --> <dependency> <groupId>com.datastax.spark</groupId> <artifactId>spark-cassandra-connector_2.11</artifactId> <version>2.3.1</version> </dependency> I want to read these values and write it into another Cassandra table using spark Scala. See code below:
val df2 = spark.read .format("org.apache.spark.sql.cassandra") .option("spark.cassandra.connection.host","hostname") .option("spark.cassandra.connection.port","9042") .option( "spark.cassandra.auth.username","usr") .option("spark.cassandra.auth.password","pas") .option("keyspace","hr") .option("table","data2") .load() Val df3 =doing some processing on df2. df3.write .format("org.apache.spark.sql.cassandra") .mode("append") .option("spark.cassandra.connection.host","hostname") .option("spark.cassandra.connection.port","9042") .option( "spark.cassandra.auth.username","usr") .option("spark.cassandra.auth.password","pas") .option("spark.cassandra.output.ignoreNulls","true") .option("confirm.truncate","true") .option("keyspace","hr") .option("table","data3") .save() But i am getting below error, when i try to insert data using above code,
java.lang.IllegalArgumentException: requirement failed: Invalid row size: 18 instead of 17. at scala.Predef$.require(Predef.scala:224) at com.datastax.spark.connector.writer.SqlRowWriter.readColumnValues(SqlRowWriter.scala:23) at com.datastax.spark.connector.writer.SqlRowWriter.readColumnValues(SqlRowWriter.scala:12) at com.datastax.spark.connector.writer.BoundStatementBuilder.bind(BoundStatementBuilder.scala:99) at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:106) at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:31) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:31) at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:233) at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:210) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:112) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111) at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:145) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111) at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:210) at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:197) at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:183) at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36) at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)