cql

GoCQL : Marshal string into timestamp

笑着哭i 提交于 2019-12-06 12:08:01
I am developing a time series data model with clustering column i.e. CREATE TABLE events ( id text, time timestamp, type text, val double, PRIMARY KEY (id, time) ) WITH CLUSTERING ORDER BY (time DESC) I wish to perform a select against the partition column 'id' and clustering column 'time'. For example, id:='1', timestamp:='2017-10-09' query := "SELECT id, time, type, val FROM events WHERE id=? AND time>=?" iterable := Cassandra.Session.Query(query, id, timestamp).Consistency(gocql.One).Iter() for iterable.MapScan(m) { found = true event = Event{ ID: m["id"].(string), Time: m["time"].(time

Cassandra tombstones count multiple queries vs single query

风流意气都作罢 提交于 2019-12-06 10:50:32
I've a cassandra table definition as following CREATE TABLE mytable ( colA text, colB text, timeCol timestamp, colC text, PRIMARY KEY ((colA, colB, timeCol), colC) ) WITH.... I want to know if number of tombstones would vary between following types of queries: 1. delete from mytable where colA = '...' AND colB = '...' and timeCol = 111 Above query affect multiple records, (multiple values of colC) 2. delete from mytable where colA = '...' AND colB = '...' and timeCol = 111 AND colC = '...' However, 2nd query needs to be executed for each value of last column colC , while 1st query takes care

Cassandra CQL range query rejected despite equality operator and secondary index

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-06 04:22:14
问题 From the table schema below, I am trying to select all pH readings that are below 5. I have followed these three pieces of advice: Use ALLOW FILTERING Include an equality comparison Create a secondary index on the reading_value column. Here is my query: select * from todmorden_numeric where sensor_name = 'pHradio' and reading_value < 5 allow filtering; Which is rejected with this message: Bad Request: No indexed columns present in by-columns clause with Equal operator I tried adding a

Importing cassandra table into spark via sparklyr - possible to select only some columns?

本小妞迷上赌 提交于 2019-12-06 01:34:14
I've been working with sparklyr to bring large cassandra tables into spark, register these with R and conduct dplyr operations on them. I have been successfully importing cassandra tables with the code that looks like this: # import cassandra table into spark cass_df <- sparklyr:::spark_data_read_generic( sc, "org.apache.spark.sql.cassandra", "format", list(keyspace = "cass_keyspace", table = "cass_table") ) %>% invoke("load") # register table in R cass_tbl <- sparklyr:::spark_partition_register_df( sc, cass_df, name = "cass_table", repartition = 0, memory = TRUE) ) Some of these cassandra

What causes “no viable alternative at input 'None'” error with Cassandra CQL

只愿长相守 提交于 2019-12-05 17:06:26
问题 I'm attempting to insert a modified document back to Cassandra DB with a new key. I'm having hard time figuring out what is the issue the error message is pointing at. When looking for others that have had similar problems the answers seem to be related to the keys, and in my case the None is just a value of few of the keys. How do I solve this issue? keys = ','.join(current.keys()) params = [':' + x for x in current.keys()] values = ','.join(params) query = "INSERT INTO wiki.pages (%s)

Event de-duplication using Cassandra

五迷三道 提交于 2019-12-05 16:20:57
I'm looking for the best way to de-duplicate events using Cassandra. I have many clients receiving event id's (thousands per second). I need to ensure that each event id is processed once and only once with high reliability and high availability. So far I've tried two methods: Use the event id as a partition key, and do an "INSERT ... IF NOT EXISTS". If that fails, then the event is a duplicate and can be dropped. This is a nice clean approach, but the throughput is not great due to Paxos, especially with higher replication factors such as 3. It's also fragile, since IF NOT EXISTS always

Apache Nifi/Cassandra - how to load CSV into Cassandra table

坚强是说给别人听的谎言 提交于 2019-12-05 14:44:29
I have various CSV files incoming several times per day, storing timeseries data from sensors, which are parts of sensors stations. Each CSV is named after the sensor station and sensor id from which it is coming from, for instance "station1_sensor2.csv". At the moment, data is stored like this : > cat station1_sensor2.csv 2016-05-04 03:02:01.001000+0000;0; 2016-05-04 03:02:01.002000+0000;0.1234; 2016-05-04 03:02:01.003000+0000;0.2345; I have created a Cassandra table to store them and to be able to query them for various identified tasks. The Cassandra table looks like this : cqlsh > CREATE

Order By any field in Cassandra

本秂侑毒 提交于 2019-12-05 13:33:53
I am researching cassandra as a possible solution for my up coming project. The more I research the more I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created. Is it possible to sort on any field? If there is a performance impact for sorting on fields not in the cluster what is that performance impact? I need to sort around or about 2 million records in the table. I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created. It's not so much that it's a bad idea. It's just really not

Cassandra Allow filtering

删除回忆录丶 提交于 2019-12-05 09:00:25
I have a table as below CREATE TABLE test ( day int, id varchar, start int, action varchar, PRIMARY KEY((day),start,id) ); I want to run this query Select * from test where day=1 and start > 1475485412 and start < 1485785654 and action='accept' ALLOW FILTERING Is this ALLOW FILTERING efficient? I am expecting that cassandra will filter in this order 1. By Partitioning column(day) 2. By the range column(start) on the 1's result 3. By action column on 2's result. So the allow filtering will not be a bad choice on this query. In case of the multiple filtering parameters on the where clause and

Cassandra: “Unable to complete the operation against any hosts” during session.execute()

萝らか妹 提交于 2019-12-05 08:45:16
Cassandra version: 1.2.2 Thrift API version: 19.35.0 CQL supported versions: 2.0.0,3.0.1 (default: 3.0.1) cassandra-driver for python 3.4 running cassandra/bin/cassandra with sudo Code sample : from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect() # 1 session.execute("use test") # 2 cluster.shutdown() Error message for # 2: session.execute("use test") File "cassandra/cluster.py", line 1581, in cassandra.cluster.Session.execute File "cassandra/cluster.py", line 3145, in cassandra.cluster.ResponseFuture.result cassandra.cluster.NoHostAvailable: ('Unable to