cassandra

Custom full-text index stored in Cassandra

╄→尐↘猪︶ㄣ 提交于 2020-01-02 09:38:29
问题 I've got a situation where I'm using Cassandra for DB and I need full-text search capability. Now I'm aware of Apache Solr, Apache Cassandra, and DSE search. However, I do not want to use a costly and proprietary software(DSE search). The reason I do not want to use Apache Solr is because I don't want to deal with HA, sharding, and redundency for it. Cassandra is perfect for HA, sharding, and redundency; I would like to store my full-text index in the existing Cassandra DB. So what I'm

Cassandra slowed down with more nodes

孤街浪徒 提交于 2020-01-02 09:12:15
问题 I set up a Cassandra cluster on AWS. What I want to get is increased I/O throughput (number of reads/writes per second) as more nodes are added (as advertised). However, I got exactly the opposite. The performance is reduced as new nodes are added. Do you know any typical issues that prevents it from scaling? Here is some details: I am adding a text file (15MB) to the column family. Each line is a record. There are 150000 records. When there is 1 node, it takes about 90 seconds to write. But

Atomic Batch in Cassandra

牧云@^-^@ 提交于 2020-01-02 09:11:49
问题 Consider the following batch statement in Cassandra: BEGIN BATCH INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' DELETE * FROM users WHERE userID = 'user2' INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3c', 'Andrew') APPLY BATCH; Will the above statements in Cassandra batch ensures row-level isolation (userID is the row key) as the row key is the same? 回答1: One important

RDD not serializable Cassandra/Spark connector java API

别说谁变了你拦得住时间么 提交于 2020-01-02 07:31:34
问题 so I previously had some questions on how to query cassandra using spark in a java maven project here: Querying Data in Cassandra via Spark in a Java Maven Project Well my question was answered and it worked, however I've run into an issue (possibly an issue). I'm trying to now use the datastax java API. Here is my code: package com.angel.testspark.test2; import org.apache.commons.lang3.StringUtils; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache

cassandra sharding and replication

懵懂的女人 提交于 2020-01-02 05:41:11
问题 I am new to Cassandra was going though this Article explaining sharding and replication and I am stuck at a point that is - I have a cluster with 6 Cassandra nodes configured at my local machine. I create a new keyspace "TestKeySpace" with replication factor as 6 and a table in keyspace "employee" and primary key is auto-increment-number named RID. I am not able to understand how this data will be partitioned and replicated. What I want to know is since I am keeping my replication factor to

Storing time ranges in cassandra

大城市里の小女人 提交于 2020-01-02 04:01:05
问题 I'm looking for a good way to store data associated with a time range, in order to be able to efficiently retrieve it later. Each entry of data can be simplified as (start time, end time, value) . I will need to later retrieve all the entries which fall inside a (x, y) range. In SQL, the query would be something like SELECT value FROM data WHERE starttime <= x AND endtime >= y Can you suggest a structure for the data in Cassandra which would allow me to perform such queries efficiently? 回答1:

Spark Connector error: WARN NettyUtil: Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead

心不动则不痛 提交于 2020-01-02 03:31:13
问题 Here are my specs: Casssandra version: 3.0.0 Operating System: Mac OSX Yosemite 10.10.5 Spark Version: 1.4.1 Context: I have created a keyspace "movies" and a table "movieinfo in Cassandra. I have installed and assembled a jar file following the guidance from this post. I have written a small script(below) to test my connection: scala> sc.stop scala> import com.datastax.spark.connector._ import com.datastax.spark.connector._ scala> import org.apache.spark.SparkConf import org.apache.spark

kafka spark-streaming data not getting written into cassandra. zero rows inserted

帅比萌擦擦* 提交于 2020-01-01 20:36:31
问题 While writing data to cassandra from spark, data is not getting written. The flash back is: I am doing a kafka-sparkStreaming-cassandra integration. I am reading kafka messages and trying to put it in a cassandra table CREATE TABLE TEST_TABLE(key INT PRIMARY KEY, value TEXT) . kafka to spark-streaming is running cool, but spark to cassandra, there is some issue...data not getting written to table. I am able to create a connection with cassandra, but the data is not getting inserted into the

Cassandra CQL3 composite key not written by Hadoop reducer

我是研究僧i 提交于 2020-01-01 19:50:52
问题 I'm using Cassandra 1.2.8, and have several Hadoop MapReduce jobs, that read rows from some CQL3 tables and write result back to another CQL3 tables. If output CQL3 tables contain composite key, values of the composite key fields are not written by reducer - instead I see empty values for those fields, while performing select query in cqlsh. If the primary key is not composite, everything works correctly. Example of the output CQL3 table with composite key: CREATE TABLE events_by_type_with

Cassandra CQL3 composite key not written by Hadoop reducer

霸气de小男生 提交于 2020-01-01 19:50:44
问题 I'm using Cassandra 1.2.8, and have several Hadoop MapReduce jobs, that read rows from some CQL3 tables and write result back to another CQL3 tables. If output CQL3 tables contain composite key, values of the composite key fields are not written by reducer - instead I see empty values for those fields, while performing select query in cqlsh. If the primary key is not composite, everything works correctly. Example of the output CQL3 table with composite key: CREATE TABLE events_by_type_with