datastax | 易学教程

Cassandra bucket splitting for partition sizing

阅读更多关于 Cassandra bucket splitting for partition sizing

I am quite new to Cassandra, I just learned it with Datastax courses, but I don't find enough information on bucket here or on the Internet and in my application I need to use buckets to split my data. I have some instruments that will make measures, quite a lot, and splitting the measures daily (timestamp as partition key) might be a bit risky as we can easily reach the limit of 100MB for a partition. Each measure concerns a specific object identified with an ID. So I would like to use a bucket, but I don't know how to do. I'm using Cassandra 3.7 Here is how my table will look like, roughly:

Can't connect to CFS node

阅读更多关于 Can't connect to CFS node

问题 I removed (or decommisioned, can't remember) a DSE analytics node (with IP 10.14.5.50 ) a couple of months ago. When I now try to execute a dse shark ( CREATE TABLE ccc AS SELECT ... ) query I now receiving: 15/01/22 13:23:17 ERROR parse.SharkSemanticAnalyzer: org.apache.hadoop.hive.ql.parse.SemanticException: 0:0 Error creating temporary folder on: cfs://10.14.5.50/user/hive/warehouse/mykeyspace.db. Error encountered near token 'TOK_TMP_FILE' at org.apache.hadoop.hive.ql.parse

Spark and Cassandra Java application: Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

阅读更多关于 Spark and Cassandra Java application: Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

问题 I got an amazingly siplme java application which I almost copied from this one example: http://markmail.org/download.xqy?id=zua6upabiylzeetp&number=2 All I wanted to do is to read the table data and display in the Eclipse console. My pom.xml: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>chat

datastax Opscenter can't add nodes, “Error provisioning cluster: Request ID is invalid” ,

阅读更多关于 datastax Opscenter can't add nodes, “Error provisioning cluster: Request ID is invalid” ,

Update 2 There was a bug in Opscenter not matching dsc22 configuration with cassandra community version, this solved one problem. Update After reading the opscenter log again I think there actually something wrong with the 4 authentication fields or some ssh configuration, but I still don't know what exactly should be done, The field says "Local node credentials (sudo) private key (optional) the scenario is as following: I installed 4 nodes with vagrant and ansible where each has dsc22,opscenter (redundant I know),datastax-agent,cassandra-tool, oracle java 8 configuration below nodetool status

Unable to connect to Cassandra remotely using DataStax Python driver

阅读更多关于 Unable to connect to Cassandra remotely using DataStax Python driver

I'm having trouble connecting to Cassandra (running on an EC2 node) remotely (from my laptop). When I use the DataStax Python driver for Cassandra: from cassandra.cluster import Cluster cluster = Cluster(['10.X.X.X'], port=9042) cluster.connect() I get: Traceback (most recent call last): File "/Users/user/virtualenvs/test/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3035, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-23-dc85f20fd4f5>", line 1, in <module> session = cluster.connect() File "/Users/user/virtualenvs/test/lib/python2.7

Cassandra - one big table vs many tables

阅读更多关于 Cassandra - one big table vs many tables

I'm currently looking trying out Cassandra database. I'm using DataStax Dev center and DataStax C# driver. My Current model is quite simple and consists of only: ParameterId (int) - would serve as the id of the table. Value (bigint) MeasureTime (timestamp) I will be having 1000 (no more, no less) parameters, from 1 - 1000. And will be getting an entry for each parameter once pr. second and will be running for years. My question is regarding whether it is better practice to create a table as: CREATE TABLE keyspace.measurement ( parameterId int, value bigint, measureTime timestamp, PRIMARY KEY

Any clue how to join this spark-structured stream joins?

阅读更多关于 Any clue how to join this spark-structured stream joins?

问题 I am using spark-sql-2.4.1 with spark-cassandra-connector-2_11.jar I am trying to join to streaming datasets as below : Dataset<Row> companyInfo_df = company_info_df .select("companyInfo.*" ) .withColumn("companyInfoEventTs", ( col("eventTs").divide(1000) ).cast(DataTypes.TimestampType)) .withWatermark("companyInfoEventTs", "60 seconds"); Dataset<Row> companyFin_df = comapany_fin_df .select("companyFin.*" ) .withColumn("eventTimeStamp", ( col("eventTs").divide(1000) ).cast(DataTypes

How is the Spark master elected in a Datastax Enterprise cluster?

阅读更多关于 How is the Spark master elected in a Datastax Enterprise cluster?

问题 How is the Spark master elected in a Datastax Enterprise cluster? I have looked at the configurations in /etc/dse/dse-env.sh and /etc/dse/spark/spark-defaults.conf and /etc/dse/spark/spark-env.sh . But I cannot find it any of those locations? On our cluster the Spark master keeps on moving from one node to another after the restart of the services? 回答1: In DSE 4.6, the Spark Master / Hadoop Job Tracker (always on the same node) are determined by a round robin algorithm and are stored in

Cassandra bucket splitting for partition sizing

阅读更多关于 Cassandra bucket splitting for partition sizing

问题 I am quite new to Cassandra, I just learned it with Datastax courses, but I don't find enough information on bucket here or on the Internet and in my application I need to use buckets to split my data. I have some instruments that will make measures, quite a lot, and splitting the measures daily (timestamp as partition key) might be a bit risky as we can easily reach the limit of 100MB for a partition. Each measure concerns a specific object identified with an ID. So I would like to use a

SET consistency level for Cassandra DDL

阅读更多关于 SET consistency level for Cassandra DDL

In my application logs I've seen that, by default, after running create/alter table statements, cassandra driver seems to do processing to bring schema into agreement for up to 10 seconds. Can I (and should I) set a consistency level for example 'quorum' while executing DDL statements like 'Create Table .. IF NOT EXISTS' to make sure my table gets created and propagated to all nodes? Schema changes (data definition) in Cassandra since 1.1+ are done via gossip . Since it uses Gossip, it is a separate read/write path than your typical data manipulation requests (SELECT, DELETE, INSERT, etc), and