datastax | 易学教程

Ignore Nulls with Data frame using spark datastax connector

阅读更多关于 Ignore Nulls with Data frame using spark datastax connector

问题 We have a Cassandra schema with more than 50 columns and we are inserting data into it from multiple data sources by transforming the data using Spark (Data frames not rdd). We are running into the issue of many tombstones as our data is sparse. Already tried the spark.cassandra.output.ignoreNulls=true but its not working. What would be right config to not write null values in cassandra? I am using zeppelin to run my spark code and push data to C* 回答1: Figured out the solution to this: A hint

Saving the data from SparkStreaming Workers to Database

阅读更多关于 Saving the data from SparkStreaming Workers to Database

问题 In SparkStreaming should we off load the saving part to another layer because SparkStreaming context is not available when we use SparkCassandraConnector if our database is cassandra. Moreover, even if we use some other database to save our data then we need to create connection on the worker every time we process a batch of rdds. Reason being connection objects are not serialized. Is it recommended to create/close connections at workers? It would make our system tightly coupled with the

Stop Cassandra update automatically

阅读更多关于 Stop Cassandra update automatically

问题 Please help me how I can stop Cassandra update automatically? At present when I install Cassandra through opscentre I get an error. Unable to restart DSE service. See /var/log/cassandra/system.log and /var/log/cassandra/output.log on the target node for details. system.log ERROR [main] 2018-03-28 07:58:26,123 CassandraDaemon.java:705 - Exception encountered during startup java.lang.AbstractMethodError: org.apache.cassandra.utils.JMXServerUtils$Exporter.exportObject(Ljava/rmi/Remote;ILjava/rmi

How to map JavaBean columsn with Casssandra table fields?

阅读更多关于 How to map JavaBean columsn with Casssandra table fields?

问题 I am using spark-sql.2.4.1v , datastax-java-cassandra-connector_2.11-2.4.1.jar and java8. I have cassandra table like create company(company_id int PRIMARY_KEY, company_name text); JavaBean as below @Table(name = "company") class CompanyRecord( @PartitionKey(0) @Column(name="company_id") Integer companyId; @Column(name="company_name") String companyName; //getter and setters //default & parametarized constructors ) I have spark code below save the data into cassandra table. Dataset<Row>

Cassandra Datastax Optimal PoolingOptions

阅读更多关于 Cassandra Datastax Optimal PoolingOptions

问题 I'm Working on a spring/java webapp using cassandra as backend; the app would be used by potentially hundreds of customers simultaneously. I see that default Cluster PoolingOptions connection pool settings (With protocol v3) are: LOCAL hosts: core = max = 1 REMOTE hosts: core = max = 1 And default maxRequestsPerConnection setting (With protocol v3) is: 1024 for LOCAL hosts, and 256 for REMOTE hosts. Will these default settings be sufficient to fulfill our usage requirements? If not, What

Cassandra write benchmark, low (20%) CPU usage

阅读更多关于 Cassandra write benchmark, low (20%) CPU usage

问题 I'm building Cassandra 3x m1.large cluster on Amazon EC2. I've used DataStax Auto-Clustering AMI 2.5.1-pv, with Cassandra DataStax Community version 2.2.0-1. When doing write benchmarks, on 'production' data, it seems that cluster can handle around 3k to 5k write requests per second, without read load. Nearly all the time nodes do: Compaction of system.hints Compaction of mykeyspace.mybigtable Compaction of mybigtable index However, what worries me is the low CPU usage. All of the 3 nodes

Cassandra Ec2MultiRegionSnitch or GossipingPropertyFileSnitch for AWS regions

阅读更多关于 Cassandra Ec2MultiRegionSnitch or GossipingPropertyFileSnitch for AWS regions

问题 We have 3 Cassandra nodes in U.S. AWS region and 3 nodes in Singapore AWS region. If I have to build a multi-data center is it mandatory for us to use Ec2MultiRegionSnitch? Or can we use the GossipingPropertyFileSnitch? And should I use only private IP addresses for the both broadcast addresses and listen address here? As my system administrator told me, we don't need public IP for these and private IP should work as both can communicate with each other. But I am doubtful of that. Can someone

Datastax Cassandra OpsCenter service not starting on Windows 7

阅读更多关于 Datastax Cassandra OpsCenter service not starting on Windows 7

问题 I am new to Cassandra. I installed DataStax Cassandra community edition on Windows 7 64 bit by following the instructions mentioned in DataStax Community Edition. I following exactly same instructions, but for some reason I could not about connect to OpsCenter. Then I went and trying to start OpsCenter service manually. I am getting the following error. Windows could not start the DataStax OpsCenter Community 2.0.6 on Local Computer. For more information, review the System Event Log. If this

Inserting special characters

阅读更多关于 Inserting special characters

问题 I'm trying to insert special characters in my Cassandra table but I couldn't insert it. Inserting data in table with umlaut is not possible As mentioned in the link i tried above link even though my character set is UTF8 as mentioned.I'm not able to insert. I've tried using quotes also still didn't work CREATE TABLE test.calendar ( race_id int, race_start_date timestamp, race_end_date timestamp, race_name text, PRIMARY KEY (race_id, race_start_date, race_end_date) ) WITH CLUSTERING ORDER BY

Cannot connect to cassandra from Spark

阅读更多关于 Cannot connect to cassandra from Spark

问题 I have some test data in my cassandra. I am trying to fetch this data from spark but I get an error like : py4j.protocol.Py4JJavaError: An error occurred while calling o25.load. java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042 This is what I've done till now: started ./bin/cassandra created test data using cql with keyspace ="testkeyspace2" and table="emp" and some keys and corresponding values. Wrote standalone.py Ran the following pyspark shell command.