datastax-enterprise | 易学教程

Unable to run spark master in dse 4.5 and slaves file is missing

阅读更多关于 Unable to run spark master in dse 4.5 and slaves file is missing

I have 5 node cluster in DSE 4.5 is running and up. out of 5 nodes 1 node is hadoop_enabled and spark_enabled but spark master is not running. ERROR [Thread-709] 2014-07-02 11:35:48,519 ExternalLogger.java (line 73) SparkMaster: Exception in thread "main" org.jboss.netty.channel.ChannelException: Failed to bind to: /54.xxx.xxx.xxx:7077 Anyone have any idea on this?? I have also tried to export SPARK_LOCAL_IP but this is also not working DSE documentation wrongly mentioned that spark-env.sh configuration file is resources/spark/conf/spark-env.sh. actual path of configuration dir is /etc/dse

DSE OpsCenter best practice fails when Cassandra PasswordAuthenticator is used

阅读更多关于 DSE OpsCenter best practice fails when Cassandra PasswordAuthenticator is used

The following best practice checks fail when Cassandra's PasswordAuthenticator is enabled: Search nodes enabled with bad autocommit Search nodes enabled with query result cache Search nodes with bad filter cache My values are in compliance with the recommended values; and I have confirmed that the checks indeed pass when I disable authentication in Cassandra. What's weird is that there are 6 checks under the "Solr Advisor" category of the Best Practice Service and only these 3 are failing when authentication is enabled. Is this a known bug in Opscenter? I'm using v5.0.1 but I've seen this

Cqlsh with client to node SSL encryption

阅读更多关于 Cqlsh with client to node SSL encryption

Am trying to enable client to node SSL encryption in my DSE server. My cqlshrc file looks like below [connection] hostname = 127.0.0.1 port = 9160 factory = cqlshlib.ssl.ssl_transport_factory [ssl] certfile = /path/to/dse_node0.cer validate = true ;; Optional, true by default. [certfiles] ;; Optional section, overrides the default certfile in the [ssl] section. 1.2.3.4 = /path/to/dse_node0.cer When I tried to login into cqlsh shell then am getting the below error Connection error: Could not connect to 127.0.0.1:9160 There are several possible causes I hope one of these solutions is helpful. 1)

Performance of token range based queries on partition keys?

阅读更多关于 Performance of token range based queries on partition keys?

I am selecting all records from cassandra nodes based on token range of my partition key. Below is the code: public static synchronized List<Object[]> getTokenRanges( final Session session) { if (cluster == null) { cluster = session.getCluster(); } Metadata metadata = cluster.getMetadata(); return unwrapTokenRanges(metadata.getTokenRanges()); } private static List<Object[]> unwrapTokenRanges(Set<TokenRange> wrappedRanges) { final int tokensSize = 2; List<Object[]> tokenRanges = new ArrayList<>(); for (TokenRange tokenRange : wrappedRanges) { List<TokenRange> unwrappedTokenRangeList =

Cassandra Allow filtering

阅读更多关于 Cassandra Allow filtering

I have a table as below CREATE TABLE test ( day int, id varchar, start int, action varchar, PRIMARY KEY((day),start,id) ); I want to run this query Select * from test where day=1 and start > 1475485412 and start < 1485785654 and action='accept' ALLOW FILTERING Is this ALLOW FILTERING efficient? I am expecting that cassandra will filter in this order 1. By Partitioning column(day) 2. By the range column(start) on the 1's result 3. By action column on 2's result. So the allow filtering will not be a bad choice on this query. In case of the multiple filtering parameters on the where clause and

How to perform nested aggregation on multiple fields in Solr?

阅读更多关于 How to perform nested aggregation on multiple fields in Solr?

问题 I am trying to perform search result aggregation (count and sum) grouping by several fields in a nested fashion. For example, with the schema shown at the end of this post, I'd like to be able to get the sum of "size" grouped by "category" and sub-grouped further by "subcategory" and get something like this: <category name="X"> <subcategory name="X_A"> <size sum="..." /> </subcategory> <subcategory name="X_B"> <size sum="..." /> </subcategory> </category> .... I've been looking primarily at

Cassandra compaction tasks stuck

阅读更多关于 Cassandra compaction tasks stuck

问题 I'm running Datastax Enterprise in a cluster consisting of 3 nodes. They are all running under the same hardware: 2 Core Intel Xeon 2.2 Ghz, 7 GB RAM, 4 TB Raid-0 This should be enough for running a cluster with a light load, storing less than 1 GB of data. Most of the time, everything is just fine but it appears that sometimes the running tasks related to the Repair Service in OpsCenter sometimes get stuck; this causes an instability in that node and an increase in load. However, if the node

Why is my Spark streaming app so slow?

阅读更多关于 Why is my Spark streaming app so slow?

I have a cluster with 4 nodes: 3 Spark nodes and 1 Solr node. My CPU is 8 core, my memory is 32 GB, disc space is SSD. I use cassandra as my database. My data amount is 22GB after 6 hours and I now have around 3,4 Million rows, which should be read in under 5 minutes. But already it can't complete the task in this amount of time. My future plan is to read 100 Million rows in under 5 minutes . I am not sure what I can increase or do better to achieve this result now as well as to achieve my future goal. Is that even possible or would it be better to use spark for the real time analysis and use

DataStax Enterprise: saveToCassandra generate a lot of hinted handoff

阅读更多关于 DataStax Enterprise: saveToCassandra generate a lot of hinted handoff

I'm in trouble with data generation from spark to cassandra using dse 4.5.3 I have a cluster of 8 nodes ( pretty powerfull nodes ) and I want to generate some test data from spark. My spark job is reading 5M of rows from a cassandra table (it represents one day of data), then is caching them in memory ( 32 GB per Node of Mem, so no problem ) and finally save them n-times in an other cassandra table, to simulate more days of data. val table = sc.cassandraTable[RecordData]( "data", "one_day" ).cache val firstDate = table.first.gets_dt_tm val start = 1 val end = 10 for(i <- start to end){ table

User Defined Type (UDT) behavior in Cassandra

阅读更多关于 User Defined Type (UDT) behavior in Cassandra

if someone has some experience in using UDT (User Defined Types), I would like to understand how the backward compatibility would work. Say I have the following UDT CREATE TYPE addr ( street1 text, zip text, state text ); If I modify "addr" UDT to have a couple of more attributes (say for example zip_code2 int, and name text): CREATE TYPE addr ( street1 text, zip text, state text, zip_code2 int, name text ); how does the older rows that does have these attributes work? Is it even compatible? Thanks The new UDT definition would be compatible with the old definition. User-defined types can have