apache-storm

Spout prematurely acks, even failed Bolt tuples

一笑奈何 提交于 2019-12-10 10:57:09
问题 I'm using the Python Storm library streamparse (which utilizes pystorm underneath). I've had problems calling a Spouts fail() method in the boilerplate wordcount project. According to the pystorm quickstart docs and numerous things I've read, calling fail(tuple) in a Bolt should elicit a failure in the originating Spout. However, even with the few modifications I've made, I always get a Spout ack() right when it leaves the Spout. Is this the correct behavior, or do I need to change a setting

Execution flow of a storm program

百般思念 提交于 2019-12-10 05:17:07
问题 I am new in storm and trying to understand the flow of execution of different methods from spout to bolt . Like spout has different methods like nextTuple() open() declareOutputFields() activate() deactivate() and bolt has methods like prepare() execute() cleanup() declareOutputFields() so can anyone tell me the sequence of execution of these methods ? 回答1: First, when your topology is started... Create Spouts and Bolts declareOutputFields Spouts/Bolts serialized and assigned to workers

How to access an object from the topology context into a bolt when using storm?

*爱你&永不变心* 提交于 2019-12-09 04:44:20
问题 We need to pass an object when creating a topology so that the bolt can access that and do some further processing based on that object. Is it possible to pass the object via TopplogyContext and if yes, how? Or are there any other ways to pass an object when submitting a topology, before submitting so that bolt can have a handle/control on it? We need to pass the object via a context so that all bolts can access it and there is no need to force an implementation of constructor in all the

Storm, huge discrepancy between bolt latency and total latency?

拈花ヽ惹草 提交于 2019-12-09 03:49:29
问题 Below is a screenshot of my topologies' Storm UI. This was taken after the topology finished processing 10k messages. (The topology is configured with 4 workers and uses a KafkaSpout). The sum of the "process latency" of my bolts is about 8100ms and the complete latency of the topology is a much longer 115881ms. I'm aware that these sort of discrepancies can occur due to resource contention or something related to Storm internals. I believe resource contention is not an issue here; the GC

Distributed caching in storm

℡╲_俬逩灬. 提交于 2019-12-08 22:36:51
问题 How to store the temporary data in Apache storm? In storm topology, bolt needs to access the previously processed data. Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM. and again varaiable1 is received as 50 at 10:15 AM then the result should be 30 (50-20) later if varaiable1 receives 70 then the result should be 20 (70-50) at 10:30 . How to achieve this functionality. 回答1: In short, you wanted to do micro-batching calculations with in storm’s running tuples. First you need

Azure Storm vs Azure Stream Analytics

我是研究僧i 提交于 2019-12-08 18:15:59
问题 Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other differences. 回答1: It depends on your needs and requirements. I'll try to lay out the strengths and benefits of both. In terms of setup, Stream Analytics has Storm beat. Stream Analytics is great if you need to ask a lot of different questions often. Stream Analytics can also only handle CSV or JSON

Storm Spout not getting Ack

回眸只為那壹抹淺笑 提交于 2019-12-08 16:57:17
问题 I have started using storm, so I create simple topology using this tutorial When I run my topology with LocalCluster and all seem fine, My Problem is that I'm not getting ACK on the tuple, meaning my spout ack is never called. my code is below - do you know why ack is not called ? so my topology look like this public StormTopology build() { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(HelloWorldSpout.class.getSimpleName(), helloWorldSpout, spoutParallelism);

How to enable GC logging for Apache Storm workers, while preventing log file overwrites and capping disk space usage

微笑、不失礼 提交于 2019-12-08 12:51:00
问题 We recently decided to enable GC logging for Apache Storm workers on a number of clusters (exact version varies) as a aid to looking into topology-related memory and garbage collection problems. We want to do that for workers, but we also want to avoid two problems we know might happen: overwriting of the log file when a worker restarts for any reason the logs using too much disk space, leading to disks getting filled (if you keep the cluster running long enough, log files will fill up disk

Apache Storm integration with Spring framework

心不动则不痛 提交于 2019-12-08 12:48:37
问题 I'm new to Apache Storm. Currently I'm working on legacy project that involves some streaming processing using Apache Storm. I want to integrate current project with Spring. I found couple comments (Storm and Spring 4 integration, http://mail-archives.apache.org/mod_mbox/storm-user/201605.mbox/%3CCAMwbCdz7myeBs+Z2mZDxWgqBPfjcq-tynOz_+pmPrmY6umfUxA@mail.gmail.com%3E) saying that there are concerns doing that. Can someone explain me how to do such an integration or why it is impossible? 回答1:

How to run periodic tasks in an Apache Storm topology?

纵饮孤独 提交于 2019-12-08 11:36:57
问题 I have an Apache Storm topology and would like to perform a certain action every once in a while. I'm not sure how to approach this in a way which would be natural and elegant. Should it be a Bolt or a Spout using ScheduledExecutorService , or something else? 回答1: Tick tuples are a decent option https://kitmenke.com/blog/2014/08/04/tick-tuples-within-storm/ Edit: Here's the essential code for your bolt @Override public Map<String, Object> getComponentConfiguration() { // configure how often a