apache-storm

Storm - Conditionally consuming stream from kafka spout?

谁说我不能喝 提交于 2019-12-07 16:56:25
I have a scenario where I am posting json to a Kafka instance. I am then using a Kafka Spout to emit the stream to a bolt. But now I would like to add additional field (call it x ) to my json message. If x is a I would like it to be consumed by boltA, if x is b I would like it to be consumed by boltB. Is there a way to direct the stream to the proper bolt depending on the streams contents? The simplest way should be to add a SplitBolt that consumes from KafkaSpout , evaluates the field x , and forwards to different output streams: public class SplitBolt extends BaseRichBolt { OutputCollector

Storm latency caused by ack

寵の児 提交于 2019-12-07 11:11:05
问题 I was using kafka-storm to connect kafka and storm. I have 3 servers running zookeeper, kafka and storm. There is a topic 'test' in kafka that has 9 partitions. In the storm topology, the number of KafkaSpout executor is 9 and by default, the number of tasks should be 9 as well. And the 'extract' bolt is the only bolt connected to KafkaSpout, the 'log' spout. From the UI, there is a huge rate of failure in the spout. However, he number of executed message in bolt = the number of emitted

Storm KafkaSpout stopped to consume messages from Kafka Topic

老子叫甜甜 提交于 2019-12-07 10:31:25
问题 My problem is that Storm KafkaSpout stopped to consume messages from Kafka topic after a period of time. When debug is enabled in storm, I get the log file like this: 2016-07-05 03:58:26.097 o.a.s.d.task [INFO] Emitting: packet_spout __metrics [#object[org.apache.storm.metric.api.IMetricsConsumer$TaskInfo 0x2c35b34f "org.apache.storm.metric.api.IMetricsConsumer$TaskInfo@2c35b34f"] [#object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x798f1e35 "[__ack-count = {default=0}]"] #object

An Apache Storm bolt receive multiple input tuples from different spout/bolt

女生的网名这么多〃 提交于 2019-12-07 08:42:43
问题 Is it possible for a bolt receive multiple input tuples from different spout/bolt? For instance, Bolt C receive input tuples from Spout A and input tuples from Bolt B to be processed. How should I implement it? I mean writing the Java code for Bolt C and also its topology. 回答1: Tutorial answers your question. https://storm.apache.org/documentation/Tutorial.html Here is the code for your goal(C/P from tutorial): builder.setBolt("exclaim2", new ExclamationBolt(), 5) .shuffleGrouping("words")

How to use kafka and storm on cloudfoundry?

百般思念 提交于 2019-12-07 06:22:43
问题 I want to know if it is possible to run kafka as a cloud-native application, and can I create a kafka cluster as a service on Pivotal Web Services. I don't want only client integration, I want to run the kafka cluster/service itself? Thanks, Anil 回答1: I can point you at a few starting points, there would be some work involved to go from those starting points to something fully functional. One option is to deploy the kafka cluster on Cloud Foundry (e.g. Pivotal Web Services) using docker

Making the Storm JARs compile-time only in a Gradle project

不问归期 提交于 2019-12-07 02:49:20
问题 I am trying to build a Gradle project which contains a Storm project. In order to run this project on Storm, I have to first create a JAR file and let Storm run my topology, e.g. storm jar myJarFile.jar com.mypackage.MyStormMainClass I am running into problems because Gradle, by default, is including the Storm dependencies both at compile time and runtime. This causes the following exception: Exception in thread "main" java.lang.RuntimeException: Found multiple defaults.yaml resources. You're

Logging from a storm bolt - where is it going?

喜夏-厌秋 提交于 2019-12-06 23:22:29
问题 I have several bolts deployed to a topology on a cluster. Each is configured to log via slf4j . On the test machine I get both the stdout and the file appenders working fine. When I deploy this to the cluster the logging seems to have disappeared. I don't get anything in the storm logs (on the supervisor machines), to /var/log/* or anywhere else as far as I can tell. Should I be able to use a logging system inside a storm worker? If so, is there a trick to getting the messages? Machines are

using Apache's AsyncHttpClient in a storm bolt

情到浓时终转凉″ 提交于 2019-12-06 15:04:26
问题 I have a bolt that is making an API call (HTTP Get) for every tuple. to avoid the need to wait for the response, I was looking to use the apache HttpAsyncClient. after instantiating the client in the bolt's prepare method, the execute method constructs the URL from the tuple and calls sendAsyncGetRequest(url): private void sendAsyncGetRequest(String url){ httpclient.execute(new HttpGet(url), new FutureCallback<HttpResponse>() { @Override public void completed(HttpResponse response) { LOG.info

Storm topology failure while running on production

◇◆丶佛笑我妖孽 提交于 2019-12-06 14:33:47
问题 Hi I'm having a issue with running storm cluster. It is similar to My Topology is defined as : package com.abc.newsclassification; import StormBase.KnowledgeGraph.ClassifierBolt; import StormBase.KnowledgeGraph.ClientSpecificTwitterSpout; import StormBase.KnowledgeGraph.LiveTwitterSpout; import StormBase.KnowledgeGraph.NewsTwitterSpout; import StormBase.KnowledgeGraph.TwitterTrainingBolt; import StormBase.KnowledgeGraph.UrlExtractorBolt; import backtype.storm.Config; import backtype.storm

How does Storm handle fields grouping when you add more nodes?

99封情书 提交于 2019-12-06 11:45:36
Just reading more details on storm and came across it's ability to do fields grouping so for example if you where counting tweets per user and you had two tasks with a fields grouping of user-id the same user-id's would get sent to the same tasks. So task 1 could have the following counts in memory bob: 10 alice: 5 task 2 could have the following counts in memory jill:10 joe: 4 If I added a new machine to the cluster to increase capacity and ran rebalance, what happens to my counts in memory? Will you start to get users with different counts? Using fields grouping we can guide a specific field