apache-storm

how to tune the parallelism hint in storm

旧街凉风 提交于 2019-11-30 03:35:21
"parallelism hint" is used in storm to parallelise a running storm topology. I know there are concepts like worker process, executor and tasks. Would it make sense to make the parallelism hint as big as possible so that your topologies are parallelised as much as possible? My question is How to find a perfect parallelism hint number for my storm topologies. Is it depending on the scale of my storm cluster or it's more like a topology/job specific setting, it varies from one topology to another? or it depends on both? Adding to what @Chiron explained "parallelism hint" is used in storm to

how to rapidly increment counters in Cassandra w/o staleness

这一生的挚爱 提交于 2019-11-30 03:28:23
I have a Cassandra question. Do you know how Cassandra does updates/increments of counters? I want to use a storm bolt (CassandraCounterBatchingBolt from storm-contrib repo on github) which writes into cassandra. However, I'm not sure how some of the implementation of the incrementCounterColumn() method works .. and there is also the limitations with cassandra counters (from: http://wiki.apache.org/cassandra/Counters ) which makes them useless for my scenario IMHO: If a write fails unexpectedly (timeout or loss of connection to the coordinator node) the client will not know if the operation

How to submit a topology in storm production cluster using IDE

↘锁芯ラ 提交于 2019-11-30 00:22:46
I am facing an issue Must submit topologies using the 'storm' client script so that StormSubmitter knows which jar to upload while submitting a topology to a production cluster using IDE, while the same thing if i perform in command line using storm jar command, its running like heaven. I have seen examples of the same from githublink . For submitting topology i am using these set of lines conf.put(Config.NIMBUS_HOST, NIMBUS_NODE); conf.put(Config.NIMBUS_THRIFT_PORT,6627); conf.put(Config.STORM_ZOOKEEPER_PORT,2181); conf.put(Config.STORM_ZOOKEEPER_SERVERS,ZOOKEEPER_ID); conf.setNumWorkers(20);

Storm vs. Trident: When not to use Trident?

不羁岁月 提交于 2019-11-29 19:26:46
I'm working with Storm and it is fine for a lot of use cases. Recently I had a look at Trident , which is a high-level abstraction of Storm. It supports exactly-once processing and makes stateful processing easier. But now I'm wondering.. Why can't I always use Trident instead of Storm? What I read so far: Trident processes messages in batches, so throughput time could be longer. Trident is not yet able to process loops in topologies. Are there any other disadvantages when using Trident instead of Storm? Because right now, I think the disadvantages I listed above are marginal. What use cases

Distributed caching in storm

旧时模样 提交于 2019-11-29 17:37:43
How to store the temporary data in Apache storm? In storm topology, bolt needs to access the previously processed data. Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM. and again varaiable1 is received as 50 at 10:15 AM then the result should be 30 (50-20) later if varaiable1 receives 70 then the result should be 20 (70-50) at 10:30 . How to achieve this functionality. In short, you wanted to do micro-batching calculations with in storm’s running tuples. First you need to define/find key in tuple set. Do field grouping(don't use shuffle grouping) between bolts using that key

Storm fields grouping

孤者浪人 提交于 2019-11-29 15:01:27
I'm having the following situation: There is a number of bolts that calculate different values This values are sent to visualization bolt Visualization bolt opens a web socket and sends values to be visualized somehow The thing is, visualization bolt is always the same, but it sends a message with a different header for each type of bolt that can be its input. For example: BoltSum calculates sum BoltDif calculates difference BoltMul calculates multiple All this bolts use VisualizationBolt for visualization There are 3 instances of VisualizationBolt in this case My question is, should I create

Found multiple defaults.yaml resources

断了今生、忘了曾经 提交于 2019-11-29 13:44:14
when i tried to submit the topology i found this Exception in thread "main" java.lang.RuntimeException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. at backtype.storm.utils.Utils.findAndReadConfigFile(Utils.java:115) at backtype.storm.utils.Utils.readDefaultConfig(Utils.java:135) at backtype.storm.utils.Utils.readStormConfig(Utils.java:155) at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:61) at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:40) at trident.myproject.main(myproject.java:288) But this

How to call a particular method before killing a storm topology

非 Y 不嫁゛ 提交于 2019-11-29 11:03:40
How to call a particular method before killing a storm topology. I have created a topology in storm, I wanted to call particular method, just before topology gets killed. is there any predefined overridden or any method available to do this in storm framework. Thanks in advance:) There is no such thing... As a workaround, you can deactivate the topology before killing it. This ensures, that Spout.deactivate() is called. If you need to call a method at bolts, use Spout.deactivate() to sent a "notification tuple" (that does not contain data to be processed) through the whole topology. And in

How to debug Apache Storm in Eclipse?

两盒软妹~` 提交于 2019-11-29 06:56:43
We can generate storm jar using particular parameter. However, if we need to debug this project (actually a far) locally as well as remotely? If it is simple jar, that we can debug. However, here we are deploying jar using following command: storm jar project.jar main_class_name Not sure how can we deploy storm topology, so that we can do storm project in debugging mode? Please find updated yaml file as below: # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information #

how to rapidly increment counters in Cassandra w/o staleness

旧巷老猫 提交于 2019-11-29 01:06:09
问题 I have a Cassandra question. Do you know how Cassandra does updates/increments of counters? I want to use a storm bolt (CassandraCounterBatchingBolt from storm-contrib repo on github) which writes into cassandra. However, I'm not sure how some of the implementation of the incrementCounterColumn() method works .. and there is also the limitations with cassandra counters (from: http://wiki.apache.org/cassandra/Counters) which makes them useless for my scenario IMHO: If a write fails