cassandra-2.0

How to ensure data consistency in Cassandra on different tables?

断了今生、忘了曾经 提交于 2019-11-30 22:47:46
问题 I'm new in Cassandra and I've read that Cassandra encourages denormalization and duplication of data. This leaves me a little confused. Let us imagine the following scenario: I have a keyspace with four tables: A,B,C and D. CREATE TABLE A ( tableID int, column1 int, column2 varchar, column3 varchar, column4 varchar, column5 varchar, PRIMARY KEY (column1, tableID) ); Let us imagine that the other tables (B,C,D) have the same structure and the same data that table A, only with a different

new cassandra node can't gossip with seed

筅森魡賤 提交于 2019-11-30 14:41:34
I am trying to spin up a new node using cassandra 2.0.7. Both nodes are at Digital Ocean. The seed node is up and running and I can telnet to port 7000 on that host from the node I'm trying to start. [root@cassandra02 apache-cassandra-2.0.7]# telnet 10.10.1.94 7000 Trying 10.10.1.94... Connected to 10.10.1.94. Escape character is '^]'. But when I start cassandra on the new node I see the following exception: INFO 00:01:34,744 Handshaking version with /10.10.1.94 ERROR 00:02:05,733 Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache

Export large amount of data from Cassandra to CSV

本秂侑毒 提交于 2019-11-30 14:35:09
问题 I'm using Cassandra 2.0.9 for store quite big amounts of data, let's say 100Gb, in one column family. I would like to export this data to CSV in fast way. I tried: sstable2json - it produces quite big json files which are hard to parse - because tool puts data in one row and uses complicated schema (ex. 300Mb Data file = ~2Gb json), it takes a lot of time to dump and Cassandra likes to change source file names according its internal mechanism COPY - causes timeouts on quite fast EC2 instances

Export large amount of data from Cassandra to CSV

只愿长相守 提交于 2019-11-30 11:03:24
I'm using Cassandra 2.0.9 for store quite big amounts of data, let's say 100Gb, in one column family. I would like to export this data to CSV in fast way. I tried: sstable2json - it produces quite big json files which are hard to parse - because tool puts data in one row and uses complicated schema (ex. 300Mb Data file = ~2Gb json), it takes a lot of time to dump and Cassandra likes to change source file names according its internal mechanism COPY - causes timeouts on quite fast EC2 instances for big number of records CAPTURE - like above, causes timeouts reads with pagination - I used

Range query on secondary index in cassandra

北城余情 提交于 2019-11-30 09:55:34
I am using cassandra 2.1.10. So First I will clear that I know secondary index are anti-pattern in cassandra.But for testing purpose I was trying following: CREATE TABLE test_topology1.tt ( a text PRIMARY KEY, b timestamp ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace

How Cassandra select the node to send request?

梦想与她 提交于 2019-11-30 09:47:06
Imagine a Cassandra cluster needs to be accessed by a client application. In Java api we create a cluster instance and send the read or write request via a Session. If we use read/write consistency ONE, how the api select the actual node (coordinator node) in order to forward the request. Is it a random selection? please help to figure this out. Cassandra drivers use the "gossip" protocol (and a process called node discovery) to gain information about the cluster. If a node becomes unavailable, the client driver automatically tries other nodes and schedules reconnection times with the dead one

How to obtain number of rows in Cassandra table

倖福魔咒の 提交于 2019-11-30 02:43:21
This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*) of a given table in Cassandra? I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network. The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to determine (or even estimate) the row size of a C* table? I've also poked around Datastax Ops Center to see

How to reset a lost Cassandra admin user's password?

拟墨画扇 提交于 2019-11-30 01:13:46
问题 I have full access to the Cassandra installation files and a PasswordAuthenticator configured in cassandra.yaml . What do I have to do to reset admin user's password that has been lost, while keeping the existing databases intact? 回答1: The hash has changed for Cassandra 2.1: Switch to authenticator: AllowAllAuthenticator Restart cassandra UPDATE system_auth.credentials SET salted_hash = '$2a$10$H46haNkcbxlbamyj0OYZr.v4e5L08WTiQ1scrTs9Q3NYy.6B..x4O' WHERE username='cassandra'; Switch back to

new cassandra node can't gossip with seed

我们两清 提交于 2019-11-29 20:32:50
问题 I am trying to spin up a new node using cassandra 2.0.7. Both nodes are at Digital Ocean. The seed node is up and running and I can telnet to port 7000 on that host from the node I'm trying to start. [root@cassandra02 apache-cassandra-2.0.7]# telnet 10.10.1.94 7000 Trying 10.10.1.94... Connected to 10.10.1.94. Escape character is '^]'. But when I start cassandra on the new node I see the following exception: INFO 00:01:34,744 Handshaking version with /10.10.1.94 ERROR 00:02:05,733 Exception

How to get Last 6 Month data comparing with timestamp column using cassandra query?

限于喜欢 提交于 2019-11-29 18:13:42
How to get Last 6 Month data comparing with timestamp column using cassandra query? I need to get all account statement which belongs to last 3/6 months comparing with updatedTime(TimeStamp column) and CurrentTime . For example in SQL we are using DateAdd() function tor this to get. i dont know how to proceed this in cassandra. If anyone know,reply.Thanks in Advance. Cassandra 2.2 and later allows users to define functions (UDT) that can be applied to data stored in a table as part of a query result. You can create your own method if you use Cassandra 2.2 and later UDF CREATE FUNCTION monthadd