datastax

Ignore Nulls with Data frame using spark datastax connector

久未见 提交于 2019-12-11 05:14:03
问题 We have a Cassandra schema with more than 50 columns and we are inserting data into it from multiple data sources by transforming the data using Spark (Data frames not rdd). We are running into the issue of many tombstones as our data is sparse. Already tried the spark.cassandra.output.ignoreNulls=true but its not working. What would be right config to not write null values in cassandra? I am using zeppelin to run my spark code and push data to C* 回答1: Figured out the solution to this: A hint

Saving the data from SparkStreaming Workers to Database

萝らか妹 提交于 2019-12-11 05:07:54
问题 In SparkStreaming should we off load the saving part to another layer because SparkStreaming context is not available when we use SparkCassandraConnector if our database is cassandra. Moreover, even if we use some other database to save our data then we need to create connection on the worker every time we process a batch of rdds. Reason being connection objects are not serialized. Is it recommended to create/close connections at workers? It would make our system tightly coupled with the

Stop Cassandra update automatically

浪尽此生 提交于 2019-12-11 05:02:19
问题 Please help me how I can stop Cassandra update automatically? At present when I install Cassandra through opscentre I get an error. Unable to restart DSE service. See /var/log/cassandra/system.log and /var/log/cassandra/output.log on the target node for details. system.log ERROR [main] 2018-03-28 07:58:26,123 CassandraDaemon.java:705 - Exception encountered during startup java.lang.AbstractMethodError: org.apache.cassandra.utils.JMXServerUtils$Exporter.exportObject(Ljava/rmi/Remote;ILjava/rmi

How to map JavaBean columsn with Casssandra table fields?

两盒软妹~` 提交于 2019-12-11 04:59:21
问题 I am using spark-sql.2.4.1v , datastax-java-cassandra-connector_2.11-2.4.1.jar and java8. I have cassandra table like create company(company_id int PRIMARY_KEY, company_name text); JavaBean as below @Table(name = "company") class CompanyRecord( @PartitionKey(0) @Column(name="company_id") Integer companyId; @Column(name="company_name") String companyName; //getter and setters //default & parametarized constructors ) I have spark code below save the data into cassandra table. Dataset<Row>

Cassandra Datastax Optimal PoolingOptions

ぃ、小莉子 提交于 2019-12-11 04:44:59
问题 I'm Working on a spring/java webapp using cassandra as backend; the app would be used by potentially hundreds of customers simultaneously. I see that default Cluster PoolingOptions connection pool settings (With protocol v3) are: LOCAL hosts: core = max = 1 REMOTE hosts: core = max = 1 And default maxRequestsPerConnection setting (With protocol v3) is: 1024 for LOCAL hosts, and 256 for REMOTE hosts. Will these default settings be sufficient to fulfill our usage requirements? If not, What

Cassandra write benchmark, low (20%) CPU usage

為{幸葍}努か 提交于 2019-12-11 04:23:54
问题 I'm building Cassandra 3x m1.large cluster on Amazon EC2. I've used DataStax Auto-Clustering AMI 2.5.1-pv, with Cassandra DataStax Community version 2.2.0-1. When doing write benchmarks, on 'production' data, it seems that cluster can handle around 3k to 5k write requests per second, without read load. Nearly all the time nodes do: Compaction of system.hints Compaction of mykeyspace.mybigtable Compaction of mybigtable index However, what worries me is the low CPU usage. All of the 3 nodes

Cassandra Ec2MultiRegionSnitch or GossipingPropertyFileSnitch for AWS regions

拟墨画扇 提交于 2019-12-11 03:21:49
问题 We have 3 Cassandra nodes in U.S. AWS region and 3 nodes in Singapore AWS region. If I have to build a multi-data center is it mandatory for us to use Ec2MultiRegionSnitch? Or can we use the GossipingPropertyFileSnitch? And should I use only private IP addresses for the both broadcast addresses and listen address here? As my system administrator told me, we don't need public IP for these and private IP should work as both can communicate with each other. But I am doubtful of that. Can someone

Datastax Cassandra OpsCenter service not starting on Windows 7

谁说胖子不能爱 提交于 2019-12-11 02:39:24
问题 I am new to Cassandra. I installed DataStax Cassandra community edition on Windows 7 64 bit by following the instructions mentioned in DataStax Community Edition. I following exactly same instructions, but for some reason I could not about connect to OpsCenter. Then I went and trying to start OpsCenter service manually. I am getting the following error. Windows could not start the DataStax OpsCenter Community 2.0.6 on Local Computer. For more information, review the System Event Log. If this

Inserting special characters

丶灬走出姿态 提交于 2019-12-11 01:35:11
问题 I'm trying to insert special characters in my Cassandra table but I couldn't insert it. Inserting data in table with umlaut is not possible As mentioned in the link i tried above link even though my character set is UTF8 as mentioned.I'm not able to insert. I've tried using quotes also still didn't work CREATE TABLE test.calendar ( race_id int, race_start_date timestamp, race_end_date timestamp, race_name text, PRIMARY KEY (race_id, race_start_date, race_end_date) ) WITH CLUSTERING ORDER BY

Cannot connect to cassandra from Spark

流过昼夜 提交于 2019-12-10 22:37:57
问题 I have some test data in my cassandra. I am trying to fetch this data from spark but I get an error like : py4j.protocol.Py4JJavaError: An error occurred while calling o25.load. java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042 This is what I've done till now: started ./bin/cassandra created test data using cql with keyspace ="testkeyspace2" and table="emp" and some keys and corresponding values. Wrote standalone.py Ran the following pyspark shell command.