apache-storm

Spout prematurely acks, even failed Bolt tuples

 ̄綄美尐妖づ 提交于 2019-12-06 10:26:24
I'm using the Python Storm library streamparse (which utilizes pystorm underneath). I've had problems calling a Spouts fail() method in the boilerplate wordcount project. According to the pystorm quickstart docs and numerous things I've read, calling fail(tuple) in a Bolt should elicit a failure in the originating Spout. However, even with the few modifications I've made, I always get a Spout ack() right when it leaves the Spout. Is this the correct behavior, or do I need to change a setting/instance variable? I'm on streamparse 3.4.0 and storm 1.0.2 . My logs show the Spout ack() coming

Running Trident Topology in Storm TrackedTopology Unit Test

本小妞迷上赌 提交于 2019-12-06 10:15:48
问题 How can I run a JUnit test of a Trident Topology to allow tuples to flow through the topology while testing and verifying the output at each stage? I've tried running within Storm's Testing framework, but it's falling short of allowing verification and consistent execution of Trident. Here's an example topology with some in-line comments where I'm having the most issues. import static org.junit.Assert.assertEquals; import java.util.Arrays; import java.util.List; import org.junit.Test; import

Storm Cluster Duplicate Tuples

↘锁芯ラ 提交于 2019-12-06 10:09:14
问题 Currently I am working on a project where I have setup a Storm cluster across four Unix hosts. The topology itself is as follows: JMS Spout listens to an MQ for new messages JMS Spout parses and then emits the result to an Esper Bolt The Esper Bolt then processes the event and emits a result to a JMS Bolt The JMS Bolt then publishes the message back onto the MQ on a different topic I realize that Storm is a "at least-once" framework. However, if I receive 5 events and pass these onto the

Storm Topology not submit

こ雲淡風輕ζ 提交于 2019-12-06 05:24:56
问题 i have configured my machine zookeeper,nimbus,supervisor are running properly and my topology working in LocalCluster LocalCluster cluster = new LocalCluster(); cluster.submitTopology("SendPost", conf, builder.createTopology()); Utils.sleep(10000000000l); cluster.killTopology("SendPost"); cluster.shutdown(); now i want try submit my topology bt it not working /usr/local/storm/bin$ ./storm jar /home/winoria/Desktop/Storm/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies

Monitoring Kafka Spout with KafkaOffsetMonitoring tool

♀尐吖头ヾ 提交于 2019-12-06 04:28:04
I am using the kafkaSpout that came with storm-0.9.2 distribution for my project. I want to monitor the throughput of this spout. I tried using the KafkaOffsetMonitoring, but it does not show any consumers reading from my topic. I suspect this is because I have specified the root path in Zookeeper for the spout to store the consumer offsets. How will the kafkaOffsetMonitor know that where to look for data about my kafkaSpout instance? Can someone explain exactly where does zookeeper store data about kafka topics and consumers? The zookeeper is a filesystem. So, how does it arrange data of

Field Grouping for a Kafka Spout

你。 提交于 2019-12-06 02:52:18
Can field grouping be done on tuples emitted by a kafka spout? If yes, then how does Storm gets to know the fields in a Kafka record? Kafka Spout declared its output fields like any other component. My explanation is based on current implementation of KafkaSpout. In KafkaSpout.java class we see declareOutputFields method that call getOutputFields() method of KafkaConfig Scheme. @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(_spoutConfig.scheme.getOutputFields()); } By default, KafkaConfig uses RawMultiScheme that implements this method in this way.

Failing to write offset data to zookeeper in kafka-storm

孤街浪徒 提交于 2019-12-05 21:54:46
问题 I was setting up a storm cluster to calculate real time trending and other statistics, however I have some problems introducing the "recovery" feature into this project, by allowing the offset that was last read by the kafka-spout (the source code for kafka-spout comes from https://github.com/apache/incubator-storm/tree/master/external/storm-kafka) to be remembered. I start my kafka-spout in this way: BrokerHosts zkHost = new ZkHosts("localhost:2181"); SpoutConfig kafkaConfig = new

Storm error: connection attempt 86 to Netty-Client

試著忘記壹切 提交于 2019-12-05 20:22:01
I am always getting following error: [ERROR] connection attempt 86 to Netty-Client-/202.162.247.10:6731 failed: java.net.ConnectException: Connection refused: /202.162.247.10:6731 Why is this happening? I tried and googled multiple times but found no solution. 来源: https://stackoverflow.com/questions/34648103/storm-error-connection-attempt-86-to-netty-client

Insert rows into HBase from a Storm bolt

空扰寡人 提交于 2019-12-05 19:33:47
I would like to be able to write new entries into HBase from a distributed (not local) Storm topology. There exist a few GitHub projects that provide either HBase Mappers or pre-made Storm bolts to write Tuples into HBase. These projects provide instructions for executing their samples on the LocalCluster. The problem that I am running into with both of these projects, and directly accessing the HBase API from the bolt, is that they all require the HBase-site.xml file to be included on the classpath. With the direct API approach, and perhaps with the GitHub ones as well, when you execute

An Apache Storm bolt receive multiple input tuples from different spout/bolt

旧时模样 提交于 2019-12-05 16:49:42
Is it possible for a bolt receive multiple input tuples from different spout/bolt? For instance, Bolt C receive input tuples from Spout A and input tuples from Bolt B to be processed. How should I implement it? I mean writing the Java code for Bolt C and also its topology. Tutorial answers your question. https://storm.apache.org/documentation/Tutorial.html Here is the code for your goal(C/P from tutorial): builder.setBolt("exclaim2", new ExclamationBolt(), 5) .shuffleGrouping("words") .shuffleGrouping("exclaim1"); exclaim2 will accept tuples from both words and exclaim1 , both using shuffle