cassandra

Update Map type columns in Cassandra with new key value pairs and not completely overwrite the map

有些话、适合烂在心里 提交于 2020-06-29 04:27:09
问题 Continuing the question at Insert Spark Dataset[(String, Map[String, String])] to Cassandra Table. I have a Spark Dataset of type Dataset[(String, Map[String, String])]. I have to insert the same into a Cassandra table. Here, key in the Dataset[(String, Map[String, String])] will become my primary key of the row in Cassandra. The Map in the Dataset[(String, Map[String, String])] will go in the same row in a column ColumnNameValueMap. My Cassandra table structure is: CREATE TABLE

Migrate Datastax Enterprise Cassandra to Apache Cassandra

流过昼夜 提交于 2020-06-27 10:24:09
问题 We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache. https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER 回答1: Figure out what version of Apache

Is it possible to read data only from a single node in a Cassandra cluster with a replication factor of 3?

六眼飞鱼酱① 提交于 2020-06-27 08:58:09
问题 I know that Cassandra have different read consistency levels but I haven't seen a consistency level which allows as read data by key only from one node. I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read. Even if we choose a consistency level of one we will ask all nodes but wait for the first response from any node. That is why we will load not only one node when we read but 3 (4 with a coordinator node). I think we can't really improve

Cassandra Database is not connecting with R via Rcassandra

醉酒当歌 提交于 2020-06-17 00:11:41
问题 When I'm connecting to Cassandra database using RCassandra package, connection is establishing. But When trying to use any keyspace, R is not responding.I used the following statements. library(RCassandra) rc <- RC.connect(host ="localhost", port = 9042) RC.use(rc, "db1", cache.def = TRUE) Any sugestions Please 回答1: Your problem is that you're specifying the port directly, and you're using the port of the native protocol, while RCassandra uses thrift protocol (that uses port 9160), so when it

Cassandra java driver protocol version and connection limits don't match

陌路散爱 提交于 2020-06-12 18:56:23
问题 I am using java driver version: 2.1.4 Cassandra version: dsc-cassandra-2.1.10 Output from cql gives the following cqlsh 5.0.1 | Cassandra 2.1.10 | CQL spec 3.2.1 | Native protocol v3 I am protocol V3. But it throws an exception when I try to set it to more than 128 requests per connection. This seems to be a restriction in V2. Explained below: The following code block: PoolingOptions poolingOptions = new PoolingOptions(); poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 8);

Cassandra java driver protocol version and connection limits don't match

假如想象 提交于 2020-06-12 18:55:01
问题 I am using java driver version: 2.1.4 Cassandra version: dsc-cassandra-2.1.10 Output from cql gives the following cqlsh 5.0.1 | Cassandra 2.1.10 | CQL spec 3.2.1 | Native protocol v3 I am protocol V3. But it throws an exception when I try to set it to more than 128 requests per connection. This seems to be a restriction in V2. Explained below: The following code block: PoolingOptions poolingOptions = new PoolingOptions(); poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 8);

Cassandra java driver protocol version and connection limits don't match

久未见 提交于 2020-06-12 18:52:51
问题 I am using java driver version: 2.1.4 Cassandra version: dsc-cassandra-2.1.10 Output from cql gives the following cqlsh 5.0.1 | Cassandra 2.1.10 | CQL spec 3.2.1 | Native protocol v3 I am protocol V3. But it throws an exception when I try to set it to more than 128 requests per connection. This seems to be a restriction in V2. Explained below: The following code block: PoolingOptions poolingOptions = new PoolingOptions(); poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 8);

Cannot connect to Cassandra from Spark (Contact points contain multiple data centers)

≡放荡痞女 提交于 2020-06-12 07:32:37
问题 I am trying to run my first spark job (a Scala job that accesses Cassandra) which is failing and showing the following error : java.io.IOException: Failed to open native connection to Cassandra at {<ip>}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150) at com.datastax.spark.connector

Cannot connect to Cassandra from Spark (Contact points contain multiple data centers)

允我心安 提交于 2020-06-12 07:32:12
问题 I am trying to run my first spark job (a Scala job that accesses Cassandra) which is failing and showing the following error : java.io.IOException: Failed to open native connection to Cassandra at {<ip>}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150) at com.datastax.spark.connector

Window spec/function perform optimum way or any alternative should be preferred

强颜欢笑 提交于 2020-06-04 08:27:15
问题 I am using spark-sql-2.4.1v. In my use-case using window spec/feature to find the latest records using the rank() function. I have to find a latest record on certain partitioning keys and order by insertion_date . It is extremely slow. Can this window-spec rank() can be used in production-grade code? Or is there any alternative way recommended? Specifically to improve performance. Please advice. I'm currently using the below code: Dataset<Row> data = sqlContext.read.format("org.apache.spark