spark-cassandra-connector

Inserting into cassandra table from spark dataframe results in org.codehaus.commons.compiler.CompileException: File 'generated.java' Error

倖福魔咒の 提交于 2019-11-30 09:47:45
问题 I am using spark-sql.2.4.1v, datastax-java-cassandra-connector_2.11-2.4.1.jar and java8. I create the cassandra table like this: create company(company_id int PRIMARY_KEY, company_name text); JavaBean as below: class CompanyRecord( Integer company_id; String company_name; //getter and setters //default & parametarized constructors ) The spark code below saves the data into cassandra table: Dataset<Row> latestUpdatedDs = joinUpdatedRecordsDs.select("company_id", "company_name"); /// select

Spark 1.5.1, Cassandra Connector 1.5.0-M2, Cassandra 2.1, Scala 2.10, NoSuchMethodError guava dependency

独自空忆成欢 提交于 2019-11-29 18:00:27
New to the Spark environment (and fairly new to Maven) so I'm struggling with how to send the dependencies I need correctly. It looks like Spark 1.5.1 has a guava-14.0.1 dependency which it tries to use and the isPrimitive was added in 15+. What's the correct way to ensure my uber-jar wins? I've tried spark.executor.extraClassPath in my spark-defaults.conf to no avail. Duplicate to this [question]: Spark 1.5.1 + Scala 2.10 + Kafka + Cassandra = Java.lang.NoSuchMethodError: but for Maven essentially (don't have rep to comment yet) Stripped down my dependencies to this: <dependency> <groupId>com

NoSuchMethodError from spark-cassandra-connector with assembled jar

一个人想着一个人 提交于 2019-11-29 12:52:36
I'm fairly new to Scala and am trying to build a Spark job. I've built ajob that contains the DataStax connector and assembled it into a fat jar. When I try to execute it it fails with a java.lang.NoSuchMethodError . I've cracked open the JAR and can see that the DataStax library is included. Am I missing something obvious? Is there a good tutorial to look at regarding this process? Thanks console $ spark-submit --class org.bobbrez.CasCountJob ./target/scala-2.11/bobbrez-spark-assembly-0.0.1.jar ks tn ... Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.zero(

Reading from Cassandra using Spark Streaming

余生长醉 提交于 2019-11-28 21:57:55
I have a problem when i use spark streaming to read from Cassandra. https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md#reading-from-cassandra-from-the-streamingcontext As the link above, i use val rdd = ssc.cassandraTable("streaming_test", "key_value").select("key", "value").where("fu = ?", 3) to select the data from cassandra, but it seems that the spark streaming has just one query once but i want it continues to query using an interval 10 senconds. My code is as follow, wish for your response. Thanks! import org.apache.spark._ import org.apache.spark

Apache Spark taking 5 to 6 minutes for simple count of 1 billon rows from Cassandra

岁酱吖の 提交于 2019-11-28 09:31:37
I am using the Spark Cassandra connector. It take 5-6 minutes for fetch data from Cassandra table. In Spark I have seen many tasks and Executor in log. The reason might be that Spark divided the process in many tasks! Below is my code example : public static void main(String[] args) { SparkConf conf = new SparkConf(true).setMaster("local[4]") .setAppName("App_Name") .set("spark.cassandra.connection.host", "127.0.0.1"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<Demo_Bean> empRDD = javaFunctions(sc).cassandraTable("dev", "demo"); System.out.println("Row Count"+empRDD.count()); }

using prepared statement multiple times, giving a warning of Cassandra Querying Reducing Performance

蹲街弑〆低调 提交于 2019-11-27 16:28:56
I am getting data from somewhere and inserting it into cassandra daily basis then I need to retrieve the data from cassandra for whole week and do some processing and insert result back onto cassandra . i have lot of records, each record executing most of the below operations. To do this I have written a program below its working fine but I get warning and according to API document should not use prepare statement multiple time its reducing performance. Please tell me how to avoid this to improve the performance OR suggest me any alternative approach to achieve this in scala. Here is some part

Reading from Cassandra using Spark Streaming

你说的曾经没有我的故事 提交于 2019-11-27 14:05:59
问题 I have a problem when i use spark streaming to read from Cassandra. https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md#reading-from-cassandra-from-the-streamingcontext As the link above, i use val rdd = ssc.cassandraTable("streaming_test", "key_value").select("key", "value").where("fu = ?", 3) to select the data from cassandra, but it seems that the spark streaming has just one query once but i want it continues to query using an interval 10 senconds. My code

How to fix java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List to field type scala.collection.Seq?

橙三吉。 提交于 2019-11-26 23:35:15
问题 This error has been the hardest to trace. I am not sure what is going on. I am running a Spark cluster on my location machine. so the entire spark cluster is under one host which is 127.0.0.1 and I run on a standalone mode JavaPairRDD<byte[], Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" ) .select("rowkey", "col1", "col2", "col3", ) .spanBy(new Function<CassandraRow, byte[]>() { @Override public byte[] call(CassandraRow v1) { return v1.getBytes(

java.lang.NoClassDefFoundError: org/apache/spark/Logging

为君一笑 提交于 2019-11-26 20:31:47
I'm always getting the following error.Can somebody help me please? Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security

using prepared statement multiple times, giving a warning of Cassandra Querying Reducing Performance

喜你入骨 提交于 2019-11-26 18:39:20
问题 I am getting data from somewhere and inserting it into cassandra daily basis then I need to retrieve the data from cassandra for whole week and do some processing and insert result back onto cassandra . i have lot of records, each record executing most of the below operations. To do this I have written a program below its working fine but I get warning and according to API document should not use prepare statement multiple time its reducing performance. Please tell me how to avoid this to