datastax-enterprise

How to keep 2 Cassandra tables within same partition

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-03 13:47:18
I tried reading up on datastax blogs and documentation but could not find any specific on this Is there a way to keep 2 tables in Cassandra to belong to same partition? For example: CREATE TYPE addr ( street_address1 text, city text, state text, country text, zip_code text, ); CREATE TABLE foo ( account_id timeuuid, data text, site_id int, PRIMARY KEY (account_id) }; CREATE TABLE bar ( account_id timeuuid, address_id int, address frozen<addr>, PRIMARY KEY (account_id, address_id) ); Here I need to ensure that both of these tables/CF will live on same partition that way for the same account_id

What does rows_merged mean in compactionhistory?

自古美人都是妖i 提交于 2019-12-03 07:03:54
When I issue $ nodetool compactionhistory I get . . . compacted_at bytes_in bytes_out rows_merged . . . 1404936947592 8096 7211 {1:3, 3:1} What does {1:3, 3:1} mean? The only documentation I can find is this which states the number of partitions merged which does not explain why multiple values and what the colon means. So basically it means {tables:rows} for example {1:3, 3:1} means 3 rows were taken from one sstable (1:3) and 1 row taken from 3 (3:1) sstables, all to make the one sstable in that compaction operation. I tried it out myself so here's an example, I hope this helps: create

cassandra cql shell window got disappears after installation in windows

北城余情 提交于 2019-12-03 05:49:06
cassandra cql shell window got disappears after installation in windows? this was installed using MSI installer availalbe in planet cassandra. Why this happens ? please help me.. Thanks in advance. I had the same issue with DataStax 3.9. This is how I sorted this: Step 1: Open file: DataStax-DDC\apache-cassandra\conf\cassandra.yaml Step 2: Uncomment the cdc_raw_directory and set new value to (for windows) cdc_raw_directory: "C:/Program Files/DataStax-DDC/data/cdc_raw" Step 3: Goto Windows Services and Start the DataStax DDC Server 3.9.0 Service I had the same problem with DataStax Community 3

When to use Cassandra vs. Solr in DSE?

丶灬走出姿态 提交于 2019-12-03 03:10:10
I'm using DSE for Cassandra/Solr integration so that data are stored in Cassandra and indexed in Solr. It's very natural to use Cassandra to handle CRUD operation and use Solr for full text search respectively, and DSE can really simplify data synchronization between Cassandra and Solr. When it comes to query, however, there are actually two ways to go: Cassandra secondary/manual configured index vs. Solr. I want to know when to use which method and what's the performance difference in general, especially under DSE setup. Here is one example use case in my project. I have a Cassandra table

How can I improve the reducebykey part of my spark app?

随声附和 提交于 2019-12-02 14:11:24
I have 64 spark cores. I have over 80 Million rows of data which amount to 4.2 GB in my cassandra cluster. I now need 82 seconds to process this data. I want this reduced to 8 seconds. Any thoughts on this? Is this even possible? Thanks. This is the part of my spark app I want to improve: axes = sqlContext.read.format("org.apache.spark.sql.cassandra")\ .options(table="axes", keyspace=source, numPartitions="192").load()\ .repartition(64*3)\ .reduceByKey(lambda x,y:x+y,52)\ .map(lambda x:(x.article,[Row(article=x.article,at=x.at,comments=x.comments,likes=x.likes,reads=x.reads,shares=x.shares)]))

Prevent tombstones creation

限于喜欢 提交于 2019-12-02 12:33:02
问题 I need to perform an insert to Cassandra table without creating tombstones for any column. I am using a query similar to this : insert into my_table(col1,col2,col3) values(val1,val2,null) where col1, col2 and col3 are all the attributes in my_table. Is there any other solution or workaround to prevent tombstone creation for say col3 apart from passing only non-null attributes in our query and letting cassandra set the remaining attributes to null? 回答1: Don't include col3 in your insert and it

Spark best approach Look-up Dataframe to improve performance

不打扰是莪最后的温柔 提交于 2019-12-02 09:04:48
问题 Dataframe A (millions of records) one of the column is create_date,modified_date Dataframe B 500 records has start_date and end_date Current approach: Select a.*,b.* from a join b on a.create_date between start_date and end_date The above job takes half hour or more to run. how can I improve the performance 回答1: DataFrames currently doesn't have an approach for direct joins like that. It will fully read both tables before performing a join. https://issues.apache.org/jira/browse/SPARK-16614

Datastax Enterprise is crashing with Unable to gossip with any seeds error

喜夏-厌秋 提交于 2019-12-02 08:38:03
I am trying to stand up Datastax Enterprise Cassandra cluster in AWS. I am not able to bring up the first node (seed node) due to error: Unable to gossip with any seeds. I must say that the first time I installed Datastax Enterprise, it worked for me; However, I wanted to make it a multi node cluster and made changes to the "seeds" parameter to the private IP instead of the default "127.0.0.1" Here are the details: Datastax Enterprise 4.x installed on Centos 6.4, in a single node set up. Following are the values I changed in the default cassandra.yaml: cluster_name: 'xxxCluster' num_tokens:

Prevent tombstones creation

谁都会走 提交于 2019-12-02 04:02:14
I need to perform an insert to Cassandra table without creating tombstones for any column. I am using a query similar to this : insert into my_table(col1,col2,col3) values(val1,val2,null) where col1, col2 and col3 are all the attributes in my_table. Is there any other solution or workaround to prevent tombstone creation for say col3 apart from passing only non-null attributes in our query and letting cassandra set the remaining attributes to null? Don't include col3 in your insert and it just wont set anything. insert into my_table(col1,col2) values(val1,val2) If curious about structure on

Why not enable virtual node in an Hadoop node?

折月煮酒 提交于 2019-12-02 01:10:21
In url: http://www.datastax.com/docs/datastax_enterprise3.2/solutions/about_hadoop "Before starting an analytics/Hadoop node on a production cluster or data center, it is important to disable the virtual node configuration." What will happen if I enable virtual node in an analytics/Hadoop node? If you enable virtual nodes on hadoop node, it will lower performance of small Hadoop jobs by raising the number of mappers to at least the number of virtual nodes. E.g. if you use the default 256 vnodes / physical nodes setting, every Hadoop job will launch 257 mappers. Those mappers might have too