datastax

How to fix Exception while running locally spark-sql program on windows10 by enabling HiveSupport?

谁说我不能喝 提交于 2019-12-20 07:26:03
问题 I am working on SPARK-SQL 2.3.1 and I am trying to enable the hiveSupport for while creating a session as below .enableHiveSupport() .config("spark.sql.warehouse.dir", "c://tmp//hive") I ran below command C:\Software\hadoop\hadoop-2.7.1\bin>winutils.exe chmod 777 C:\tmp\hive While running my program getting: Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- at org.apache.hadoop.hive

CodecNotFoundException: Codec not found for requested operation: [date <-> java.util.Date]

筅森魡賤 提交于 2019-12-20 07:18:51
问题 I am using below datastax versions with java8 <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-driver-core</artifactId> <version>3.7.2</version> </dependency> <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-driver-mapping</artifactId> <version>3.7.2</version> </dependency> My table has a Date column as below cass_table ( data_source_id int, company_id text, create_date date) When I trying to save the data into C* table as below final

Unsupported literal type class scala.runtime.BoxedUnit

允我心安 提交于 2019-12-20 04:38:43
问题 I am trying to filter a column of a dataframe read from oracle as below import org.apache.spark.sql.functions.{col, lit, when} val df0 = df_org.filter(col("fiscal_year").isNotNull()) When I do it I am getting below error: java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit () at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163) at org.apache

Having performance issues with Datastax cassandra

夙愿已清 提交于 2019-12-20 03:00:46
问题 I have installed datastax Cassandra in 2 independent machines(one with 16gb RAM and other with 32GB RAM) and going with most of the default configuration. I have created a table with some 700 columns, when I try to insert records using java its able to insert 1000 records per 30 seconds, which seems be very less for me as per datastax benchmark it should be around 18000+. For my surprise performance is same in both 32GB & 16GB RAM machines. I am new to Cassandra, can any one help me in this

Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use

霸气de小男生 提交于 2019-12-19 19:43:07
问题 I am running spark job on emr and using datastax connector to connect to cassandra cluster. I am facing issues with the guava jar please find the details as below I am using below cassandra deps cqlsh 5.0.1 | Cassandra 3.0.1 | CQL spec 3.3.1 Running spark job on EMR 4.4 with below maven deps org.apache.spark spark-streaming_2.10 1.5.0 <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.5.0</version> </dependency> <dependency> <groupId>org

SET consistency level for Cassandra DDL

被刻印的时光 ゝ 提交于 2019-12-19 09:25:17
问题 In my application logs I've seen that, by default, after running create/alter table statements, cassandra driver seems to do processing to bring schema into agreement for up to 10 seconds. Can I (and should I) set a consistency level for example 'quorum' while executing DDL statements like 'Create Table .. IF NOT EXISTS' to make sure my table gets created and propagated to all nodes? 回答1: Schema changes (data definition) in Cassandra since 1.1+ are done via gossip. Since it uses Gossip, it is

Pig & Cassandra & DataStax Splits Control

两盒软妹~` 提交于 2019-12-19 09:08:14
问题 I have been using Pig with my Cassandra data to do all kinds of amazing feats of groupings that would be almost impossible to write imperatively. I am using DataStax's integration of Hadoop & Cassandra, and I have to say it is quite impressive. Hat-off to those guys!! I have a pretty small sandbox cluster (2-nodes) where I am putting this system thru some tests. I have a CQL table that has ~53M rows (about 350 bytes ea.), and I notice that the Mapper later takes a very long time to grind thru

Starting cassandra as a service does not work for 2.0.5, sudo cassandra -f works

ぃ、小莉子 提交于 2019-12-19 04:19:29
问题 When I try to start cassandra on ubuntu 12.04 (installed via Datastax's dsc20 package) as a service as follows : $ sudo service cassandra start it says *could not access pidfile for Cassandra & no other messages or anything in logs. But when I try to run as a root user( sudo cassandra -f ) it just works properly & cassandra is started. While trying to debug I found that when trying to run as a non-root user I was getting these messages: ERROR 17:48:08,432 Exception encountered during startup

How to obtain number of rows in Cassandra table

最后都变了- 提交于 2019-12-18 10:59:27
问题 This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*) of a given table in Cassandra? I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network. The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to

com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces

て烟熏妆下的殇ゞ 提交于 2019-12-17 10:58:33
问题 I am trying to configure spring data with cassandra. But I am getting bellow error , when my app is deploying in tomcat. When I check the connection, it is available to the given port. (127.0.0.1:9042). I have include stack trace and spring configuration bellow. Does anyone having idea on this error? Full stack trace : 2015-12-06 17:46:25 ERROR web.context.ContextLoader:331 - Context initialization failed org.springframework.beans.factory.BeanCreationException: Error creating bean with name