datastax

Cassandra bucket splitting for partition sizing

[亡魂溺海] 提交于 2019-12-01 13:07:42
I am quite new to Cassandra, I just learned it with Datastax courses, but I don't find enough information on bucket here or on the Internet and in my application I need to use buckets to split my data. I have some instruments that will make measures, quite a lot, and splitting the measures daily (timestamp as partition key) might be a bit risky as we can easily reach the limit of 100MB for a partition. Each measure concerns a specific object identified with an ID. So I would like to use a bucket, but I don't know how to do. I'm using Cassandra 3.7 Here is how my table will look like, roughly:

Can't connect to CFS node

て烟熏妆下的殇ゞ 提交于 2019-12-01 12:26:34
问题 I removed (or decommisioned, can't remember) a DSE analytics node (with IP 10.14.5.50 ) a couple of months ago. When I now try to execute a dse shark ( CREATE TABLE ccc AS SELECT ... ) query I now receiving: 15/01/22 13:23:17 ERROR parse.SharkSemanticAnalyzer: org.apache.hadoop.hive.ql.parse.SemanticException: 0:0 Error creating temporary folder on: cfs://10.14.5.50/user/hive/warehouse/mykeyspace.db. Error encountered near token 'TOK_TMP_FILE' at org.apache.hadoop.hive.ql.parse

Spark and Cassandra Java application: Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

混江龙づ霸主 提交于 2019-12-01 12:19:08
问题 I got an amazingly siplme java application which I almost copied from this one example: http://markmail.org/download.xqy?id=zua6upabiylzeetp&number=2 All I wanted to do is to read the table data and display in the Eclipse console. My pom.xml: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>chat

datastax Opscenter can't add nodes, “Error provisioning cluster: Request ID is invalid” ,

浪尽此生 提交于 2019-12-01 12:11:37
Update 2 There was a bug in Opscenter not matching dsc22 configuration with cassandra community version, this solved one problem. Update After reading the opscenter log again I think there actually something wrong with the 4 authentication fields or some ssh configuration, but I still don't know what exactly should be done, The field says "Local node credentials (sudo) private key (optional) the scenario is as following: I installed 4 nodes with vagrant and ansible where each has dsc22,opscenter (redundant I know),datastax-agent,cassandra-tool, oracle java 8 configuration below nodetool status

Unable to connect to Cassandra remotely using DataStax Python driver

…衆ロ難τιáo~ 提交于 2019-12-01 11:16:48
I'm having trouble connecting to Cassandra (running on an EC2 node) remotely (from my laptop). When I use the DataStax Python driver for Cassandra: from cassandra.cluster import Cluster cluster = Cluster(['10.X.X.X'], port=9042) cluster.connect() I get: Traceback (most recent call last): File "/Users/user/virtualenvs/test/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3035, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-23-dc85f20fd4f5>", line 1, in <module> session = cluster.connect() File "/Users/user/virtualenvs/test/lib/python2.7

Cassandra - one big table vs many tables

醉酒当歌 提交于 2019-12-01 10:38:00
I'm currently looking trying out Cassandra database. I'm using DataStax Dev center and DataStax C# driver. My Current model is quite simple and consists of only: ParameterId (int) - would serve as the id of the table. Value (bigint) MeasureTime (timestamp) I will be having 1000 (no more, no less) parameters, from 1 - 1000. And will be getting an entry for each parameter once pr. second and will be running for years. My question is regarding whether it is better practice to create a table as: CREATE TABLE keyspace.measurement ( parameterId int, value bigint, measureTime timestamp, PRIMARY KEY

Any clue how to join this spark-structured stream joins?

天大地大妈咪最大 提交于 2019-12-01 09:48:46
问题 I am using spark-sql-2.4.1 with spark-cassandra-connector-2_11.jar I am trying to join to streaming datasets as below : Dataset<Row> companyInfo_df = company_info_df .select("companyInfo.*" ) .withColumn("companyInfoEventTs", ( col("eventTs").divide(1000) ).cast(DataTypes.TimestampType)) .withWatermark("companyInfoEventTs", "60 seconds"); Dataset<Row> companyFin_df = comapany_fin_df .select("companyFin.*" ) .withColumn("eventTimeStamp", ( col("eventTs").divide(1000) ).cast(DataTypes

How is the Spark master elected in a Datastax Enterprise cluster?

旧时模样 提交于 2019-12-01 09:43:50
问题 How is the Spark master elected in a Datastax Enterprise cluster? I have looked at the configurations in /etc/dse/dse-env.sh and /etc/dse/spark/spark-defaults.conf and /etc/dse/spark/spark-env.sh . But I cannot find it any of those locations? On our cluster the Spark master keeps on moving from one node to another after the restart of the services? 回答1: In DSE 4.6, the Spark Master / Hadoop Job Tracker (always on the same node) are determined by a round robin algorithm and are stored in

Cassandra bucket splitting for partition sizing

天大地大妈咪最大 提交于 2019-12-01 09:41:57
问题 I am quite new to Cassandra, I just learned it with Datastax courses, but I don't find enough information on bucket here or on the Internet and in my application I need to use buckets to split my data. I have some instruments that will make measures, quite a lot, and splitting the measures daily (timestamp as partition key) might be a bit risky as we can easily reach the limit of 100MB for a partition. Each measure concerns a specific object identified with an ID. So I would like to use a

SET consistency level for Cassandra DDL

偶尔善良 提交于 2019-12-01 08:43:27
In my application logs I've seen that, by default, after running create/alter table statements, cassandra driver seems to do processing to bring schema into agreement for up to 10 seconds. Can I (and should I) set a consistency level for example 'quorum' while executing DDL statements like 'Create Table .. IF NOT EXISTS' to make sure my table gets created and propagated to all nodes? Schema changes (data definition) in Cassandra since 1.1+ are done via gossip . Since it uses Gossip, it is a separate read/write path than your typical data manipulation requests (SELECT, DELETE, INSERT, etc), and