cassandra

Do I need clock synchronisation for cassandra if only one client writes to cluster?

只愿长相守 提交于 2020-01-24 10:41:05
问题 From cassandra's documentation I got to know that cassandra uses timestamps of query to resolve conflicts between two writes and hence the clocks on all the nodes of the cluster needs to be synchronised. In my use-case we have only one client writing to the cluster and multiple clients reading from the cluster. So, if I use client-side timestamp generator (which I believe is default for version>3) do I still need to have cluster node clocks synchronised with each other? 回答1: In the context of

Cassandra could not create Java Virtual Machine

﹥>﹥吖頭↗ 提交于 2020-01-24 04:41:05
问题 I am on a Mac OS and I run cassandra -f and immediately this happens: [0.002s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/usr/local/apache-cassandra-3.0.10/logs/gc.log instead. Unrecognized VM option 'UseParNewGC' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.``` I have no idea why this is happening. I did the proper export CASSANDRA_HOME=/usr/local/apache-cassandra-3.0.10 export PATH=$PATH:$CASSANDRA_HOME/bin But still

How to get tombstone count for a cql query?

◇◆丶佛笑我妖孽 提交于 2020-01-24 02:29:13
问题 I am trying to evaluate number of tombstones getting created in one of tables in our application. For that I am trying to use nodetool cfstats. Here is how I am doing it: create table demo.test(a int, b int, c int, primary key (a)); insert into demo.test(a, b, c) values(1,2,3); Now I am making the same insert as above. So I expect 3 tombstones to be created. But on running cfstats for this columnfamily, I still see that there are no tombstones created. nodetool cfstats demo.test Average live

What is best approach to join data in spark streaming application?

我怕爱的太早我们不能终老 提交于 2020-01-23 17:19:37
问题 Question : Essentially it means , rather than running a join of C* table for each streaming records , is there anyway to run a join for each micro-batch ( micro-batching ) of records in spark streaming ? We are almost finalized to use spark-sql 2.4.x version , datastax-spark-cassandra-connector for Cassandra-3.x version. But have one fundamental question regarding the efficiency in the below scenario. For the streaming data records(i.e. streamingDataSet ) , I need to look up for existing

Cassandra tombstones count multiple queries vs single query

て烟熏妆下的殇ゞ 提交于 2020-01-23 13:18:07
问题 I've a cassandra table definition as following CREATE TABLE mytable ( colA text, colB text, timeCol timestamp, colC text, PRIMARY KEY ((colA, colB, timeCol), colC) ) WITH.... I want to know if number of tombstones would vary between following types of queries: 1. delete from mytable where colA = '...' AND colB = '...' and timeCol = 111 Above query affect multiple records, (multiple values of colC) 2. delete from mytable where colA = '...' AND colB = '...' and timeCol = 111 AND colC = '...'

What is the difference between scylla read path and cassandra read path?

こ雲淡風輕ζ 提交于 2020-01-23 12:16:52
问题 What is the difference between Scylla read path and Cassandra read path? When I stress Cassandra and Scylla then Scylla read performance poor by 5 times than Cassandra using 16 core and normal HDD. I expect better read performance on Scylla compared to Cassandra using normal HDD, because my company doesn't provide SSD's. Can someone please confirm, is it possible to achieve better read performance using normal HDD or not? If yes, what changes required scylla config?. Please guide me! 回答1:

What is the difference between scylla read path and cassandra read path?

╄→尐↘猪︶ㄣ 提交于 2020-01-23 12:15:56
问题 What is the difference between Scylla read path and Cassandra read path? When I stress Cassandra and Scylla then Scylla read performance poor by 5 times than Cassandra using 16 core and normal HDD. I expect better read performance on Scylla compared to Cassandra using normal HDD, because my company doesn't provide SSD's. Can someone please confirm, is it possible to achieve better read performance using normal HDD or not? If yes, what changes required scylla config?. Please guide me! 回答1:

Cell versioning with Cassandra

◇◆丶佛笑我妖孽 提交于 2020-01-23 11:49:32
问题 My application uses an AbstractFactory for the DAO layer so once the HBase DAO family has been implemented, It would be very great for me to create the Cassandra DAO family and see the differences from several points of view. Anyway, trying to do that, I saw Cassandra doesn't support cell versioning like HBase (and my application makes a strong usage of that) so I was wondering if there are some table design trick (or something else) to "emulate" this behaviour in Cassandra 回答1: One common

Cell versioning with Cassandra

风格不统一 提交于 2020-01-23 11:48:27
问题 My application uses an AbstractFactory for the DAO layer so once the HBase DAO family has been implemented, It would be very great for me to create the Cassandra DAO family and see the differences from several points of view. Anyway, trying to do that, I saw Cassandra doesn't support cell versioning like HBase (and my application makes a strong usage of that) so I was wondering if there are some table design trick (or something else) to "emulate" this behaviour in Cassandra 回答1: One common

Is there an alternative to joinWithCassandraTable for DataFrames in Spark (Scala) when retrieving data from only certain Cassandra partitions?

五迷三道 提交于 2020-01-23 01:13:29
问题 When extracting small number of partitions from large C* table using RDDs, we can use this: val rdd = … // rdd including partition data val data = rdd.repartitionByCassandraReplica(keyspace, tableName) .joinWithCassandraTable(keyspace, tableName) Do we have available an equally effective approach using DataFrames? Update (Apr 26, 2017): To be more concrete, I prepared an example. I have 2 tables in Cassandra: CREATE TABLE ids ( id text, registered timestamp, PRIMARY KEY (id) ) CREATE TABLE