apache-storm | 易学教程

Storm - Conditionally consuming stream from kafka spout?

阅读更多关于 Storm - Conditionally consuming stream from kafka spout?

I have a scenario where I am posting json to a Kafka instance. I am then using a Kafka Spout to emit the stream to a bolt. But now I would like to add additional field (call it x ) to my json message. If x is a I would like it to be consumed by boltA, if x is b I would like it to be consumed by boltB. Is there a way to direct the stream to the proper bolt depending on the streams contents? The simplest way should be to add a SplitBolt that consumes from KafkaSpout , evaluates the field x , and forwards to different output streams: public class SplitBolt extends BaseRichBolt { OutputCollector

Storm latency caused by ack

阅读更多关于 Storm latency caused by ack

问题 I was using kafka-storm to connect kafka and storm. I have 3 servers running zookeeper, kafka and storm. There is a topic 'test' in kafka that has 9 partitions. In the storm topology, the number of KafkaSpout executor is 9 and by default, the number of tasks should be 9 as well. And the 'extract' bolt is the only bolt connected to KafkaSpout, the 'log' spout. From the UI, there is a huge rate of failure in the spout. However, he number of executed message in bolt = the number of emitted

Storm KafkaSpout stopped to consume messages from Kafka Topic

阅读更多关于 Storm KafkaSpout stopped to consume messages from Kafka Topic

问题 My problem is that Storm KafkaSpout stopped to consume messages from Kafka topic after a period of time. When debug is enabled in storm, I get the log file like this: 2016-07-05 03:58:26.097 o.a.s.d.task [INFO] Emitting: packet_spout __metrics [#object[org.apache.storm.metric.api.IMetricsConsumer$TaskInfo 0x2c35b34f "org.apache.storm.metric.api.IMetricsConsumer$TaskInfo@2c35b34f"] [#object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x798f1e35 "[__ack-count = {default=0}]"] #object

An Apache Storm bolt receive multiple input tuples from different spout/bolt

阅读更多关于 An Apache Storm bolt receive multiple input tuples from different spout/bolt

问题 Is it possible for a bolt receive multiple input tuples from different spout/bolt? For instance, Bolt C receive input tuples from Spout A and input tuples from Bolt B to be processed. How should I implement it? I mean writing the Java code for Bolt C and also its topology. 回答1: Tutorial answers your question. https://storm.apache.org/documentation/Tutorial.html Here is the code for your goal(C/P from tutorial): builder.setBolt("exclaim2", new ExclamationBolt(), 5) .shuffleGrouping("words")

How to use kafka and storm on cloudfoundry?

阅读更多关于 How to use kafka and storm on cloudfoundry?

问题 I want to know if it is possible to run kafka as a cloud-native application, and can I create a kafka cluster as a service on Pivotal Web Services. I don't want only client integration, I want to run the kafka cluster/service itself? Thanks, Anil 回答1: I can point you at a few starting points, there would be some work involved to go from those starting points to something fully functional. One option is to deploy the kafka cluster on Cloud Foundry (e.g. Pivotal Web Services) using docker

Making the Storm JARs compile-time only in a Gradle project

阅读更多关于 Making the Storm JARs compile-time only in a Gradle project

问题 I am trying to build a Gradle project which contains a Storm project. In order to run this project on Storm, I have to first create a JAR file and let Storm run my topology, e.g. storm jar myJarFile.jar com.mypackage.MyStormMainClass I am running into problems because Gradle, by default, is including the Storm dependencies both at compile time and runtime. This causes the following exception: Exception in thread "main" java.lang.RuntimeException: Found multiple defaults.yaml resources. You're

Logging from a storm bolt - where is it going?

阅读更多关于 Logging from a storm bolt - where is it going?

问题 I have several bolts deployed to a topology on a cluster. Each is configured to log via slf4j . On the test machine I get both the stdout and the file appenders working fine. When I deploy this to the cluster the logging seems to have disappeared. I don't get anything in the storm logs (on the supervisor machines), to /var/log/* or anywhere else as far as I can tell. Should I be able to use a logging system inside a storm worker? If so, is there a trick to getting the messages? Machines are

using Apache's AsyncHttpClient in a storm bolt

阅读更多关于 using Apache's AsyncHttpClient in a storm bolt

问题 I have a bolt that is making an API call (HTTP Get) for every tuple. to avoid the need to wait for the response, I was looking to use the apache HttpAsyncClient. after instantiating the client in the bolt's prepare method, the execute method constructs the URL from the tuple and calls sendAsyncGetRequest(url): private void sendAsyncGetRequest(String url){ httpclient.execute(new HttpGet(url), new FutureCallback<HttpResponse>() { @Override public void completed(HttpResponse response) { LOG.info

Storm topology failure while running on production

阅读更多关于 Storm topology failure while running on production

问题 Hi I'm having a issue with running storm cluster. It is similar to My Topology is defined as : package com.abc.newsclassification; import StormBase.KnowledgeGraph.ClassifierBolt; import StormBase.KnowledgeGraph.ClientSpecificTwitterSpout; import StormBase.KnowledgeGraph.LiveTwitterSpout; import StormBase.KnowledgeGraph.NewsTwitterSpout; import StormBase.KnowledgeGraph.TwitterTrainingBolt; import StormBase.KnowledgeGraph.UrlExtractorBolt; import backtype.storm.Config; import backtype.storm

How does Storm handle fields grouping when you add more nodes?

阅读更多关于 How does Storm handle fields grouping when you add more nodes?

Just reading more details on storm and came across it's ability to do fields grouping so for example if you where counting tweets per user and you had two tasks with a fields grouping of user-id the same user-id's would get sent to the same tasks. So task 1 could have the following counts in memory bob: 10 alice: 5 task 2 could have the following counts in memory jill:10 joe: 4 If I added a new machine to the cluster to increase capacity and ran rebalance, what happens to my counts in memory? Will you start to get users with different counts? Using fields grouping we can guide a specific field