datastax-enterprise

Unable to run spark master in dse 4.5 and slaves file is missing

烂漫一生 提交于 2019-12-22 14:51:14
问题 I have 5 node cluster in DSE 4.5 is running and up. out of 5 nodes 1 node is hadoop_enabled and spark_enabled but spark master is not running. ERROR [Thread-709] 2014-07-02 11:35:48,519 ExternalLogger.java (line 73) SparkMaster: Exception in thread "main" org.jboss.netty.channel.ChannelException: Failed to bind to: /54.xxx.xxx.xxx:7077 Anyone have any idea on this?? I have also tried to export SPARK_LOCAL_IP but this is also not working DSE documentation wrongly mentioned that spark-env.sh

Running Search workload and Cassandra workload on the same physical node

孤街浪徒 提交于 2019-12-22 13:58:31
问题 Can't seem to find the answer to this obvious question. We have 6 servers currently configured as "Search" workload running DSE. My question is: Is it possible to run Search (Solr) and Cassandra on the same physical box? (Not) Possible / (Not) Recommended? I'm very confused with the fact that we currently are running all nodes as Solr nodes and I'm still able to use them as Cassandra (real time queries) - so it's technically both? The "Services /Best Practice" tells me that: "Please replace

Cannot record QUEUE latency of n minutes - DSE

冷暖自知 提交于 2019-12-22 01:06:24
问题 One of our nodes in our 3 node cluster is down and on checking the log file, it shows the below messages INFO [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:32,891 AbstractMetrics.java:114 - Cannot record QUEUE latency of 11 minutes because higher than 10 minutes. INFO [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,233 AbstractMetrics.java:114 - Cannot record QUEUE latency of 10 minutes because higher than 10 minutes. WARN [keyspace.core Index WorkPool work

Why is my Spark streaming app so slow?

♀尐吖头ヾ 提交于 2019-12-22 01:06:01
问题 I have a cluster with 4 nodes: 3 Spark nodes and 1 Solr node. My CPU is 8 core, my memory is 32 GB, disc space is SSD. I use cassandra as my database. My data amount is 22GB after 6 hours and I now have around 3,4 Million rows, which should be read in under 5 minutes. But already it can't complete the task in this amount of time. My future plan is to read 100 Million rows in under 5 minutes . I am not sure what I can increase or do better to achieve this result now as well as to achieve my

DataStax Enterprise: saveToCassandra generate a lot of hinted handoff

狂风中的少年 提交于 2019-12-21 23:35:12
问题 I'm in trouble with data generation from spark to cassandra using dse 4.5.3 I have a cluster of 8 nodes ( pretty powerfull nodes ) and I want to generate some test data from spark. My spark job is reading 5M of rows from a cassandra table (it represents one day of data), then is caching them in memory ( 32 GB per Node of Mem, so no problem ) and finally save them n-times in an other cassandra table, to simulate more days of data. val table = sc.cassandraTable[RecordData]( "data", "one_day" )

TTL vs default_time_to_live which one is better and why?

心已入冬 提交于 2019-12-21 22:14:41
问题 Requirement is simple: we have to create a table which will have only 24 hours of data. We have two options Defile TTL with each insert Make table property default_time_to_live for 24 hours. I have general idea about both the things but internally which one will be helpful to deal with tombstones? or both will generate same amount of tombstones? Which one will be better and why any reference link will be appreciated. 回答1: If a table has default_time_to_live on it then rows that exceed this

How to Use Apache Drill with Cassandra

青春壹個敷衍的年華 提交于 2019-12-21 04:23:25
问题 I am trying to query Cassandra using Apache Drill. The only connector I could find is here: http://www.confusedcoders.com/bigdata/apache-drill/sql-on-cassandra-querying-cassandra-via-apache-drill However this does not build. It comes up with an artifact not found error. I also had another developer who is more versed in these tools take a stab at it, but he also had no luck. I tried contacting the developer of the plugin I referenced, but the blog does not work and won't let me post comments.

What does rows_merged mean in compactionhistory?

和自甴很熟 提交于 2019-12-21 01:08:09
问题 When I issue $ nodetool compactionhistory I get . . . compacted_at bytes_in bytes_out rows_merged . . . 1404936947592 8096 7211 {1:3, 3:1} What does {1:3, 3:1} mean? The only documentation I can find is this which states the number of partitions merged which does not explain why multiple values and what the colon means. 回答1: So basically it means {tables:rows} for example {1:3, 3:1} means 3 rows were taken from one sstable (1:3) and 1 row taken from 3 (3:1) sstables, all to make the one

Cassandra Error - Clustering column cannot be restricted (preceding column is restricted by a non-EQ relation)

心不动则不痛 提交于 2019-12-20 09:56:33
问题 We are using Cassandra as the data historian for our fleet management solution. We have a table in Cassandra , which stores the details of journey made by the vehicle. The table structure is as given below CREATE TABLE journeydetails( bucketid text, vehicleid text, starttime timestamp, stoptime timestamp, travelduration bigint, PRIMARY KEY (bucketid,vehicleid,starttime,travelduration) ); Where: bucketid :- partition key which is a combination of month and year vehicleid : -unique id of the

Datastax Enterprise is crashing with Unable to gossip with any seeds error

北慕城南 提交于 2019-12-20 06:01:35
问题 I am trying to stand up Datastax Enterprise Cassandra cluster in AWS. I am not able to bring up the first node (seed node) due to error: Unable to gossip with any seeds. I must say that the first time I installed Datastax Enterprise, it worked for me; However, I wanted to make it a multi node cluster and made changes to the "seeds" parameter to the private IP instead of the default "127.0.0.1" Here are the details: Datastax Enterprise 4.x installed on Centos 6.4, in a single node set up.