apache-storm | 易学教程

How to use apache storm tuple

阅读更多关于 How to use apache storm tuple

I just began with Apache Storm. I read the tutorial and had a look into examples My problem is that all example work with very simple tuples (often one filed with a string). The tuples are created inline (using new Values(...)). In my case i have tuples with many fields (5..100). So my question is how to implement such tuple with name and type (all primitive) for each field? Are there any examples? (i think directly implementing "Tuple" isn't a good idea) thanks An alternative to creating the tuple with all of the fields as a value is to just create a bean and pass that inside the tuple. Given

How to submit a topology in storm production cluster using IDE

阅读更多关于 How to submit a topology in storm production cluster using IDE

问题 I am facing an issue Must submit topologies using the 'storm' client script so that StormSubmitter knows which jar to upload while submitting a topology to a production cluster using IDE, while the same thing if i perform in command line using storm jar command, its running like heaven. I have seen examples of the same from githublink. For submitting topology i am using these set of lines conf.put(Config.NIMBUS_HOST, NIMBUS_NODE); conf.put(Config.NIMBUS_THRIFT_PORT,6627); conf.put(Config

Setting up a docker / fig Mesos environment

阅读更多关于 Setting up a docker / fig Mesos environment

I'm trying to set up a docker / fig Mesos cluster. I'm new to fig and Docker. Docker has plenty of documentation, but I find myself struggling to understand how to work with fig. Here's my fig.yaml at the moment: zookeeper: image: jplock/zookeeper ports: - "49181:2181" mesosMaster: image: mesosphere/mesos:0.19.1 ports: - "15050:5050" links: - zookeeper:zk command: mesos-master --zk=zk --work_dir=/var/log --quorum=1 mesosSlave: image: mesosphere/mesos:0.19.1 links: - zookeeper:zk command: mesos-slave --master=zk Thanks ! Edit: Thanks to Mark O`Connor's help, I've created a working docker-based

what is PATH on the MAC (UNIX) system

阅读更多关于 what is PATH on the MAC (UNIX) system

im trying to setup a project , storm from git https://github.com/nathanmarz/storm/wiki/Setting-up-development-environment Download a Storm release , unpack it, and put the unpacked bin/ directory on your PATH My question is what is PATH mean, what exactly they want me to do ? Sometimes I see some /bin/path , $PATH, echo PATH can someone explain the concept of the PATH , so I can setup everything easily in the future without just blindly following the instructions? This is definitely techincal question. Maybe trival to professionals. But for entry people like me really need some guides. I dont

Storm vs. Trident: When not to use Trident?

阅读更多关于 Storm vs. Trident: When not to use Trident?

问题 I'm working with Storm and it is fine for a lot of use cases. Recently I had a look at Trident, which is a high-level abstraction of Storm. It supports exactly-once processing and makes stateful processing easier. But now I'm wondering.. Why can't I always use Trident instead of Storm? What I read so far: Trident processes messages in batches, so throughput time could be longer. Trident is not yet able to process loops in topologies. Are there any other disadvantages when using Trident

Why should I not loop or block in Spout.nextTuple()

阅读更多关于 Why should I not loop or block in Spout.nextTuple()

I saw many code snippets in which a loop was used inside Spout.nextTuple() (for example to read a whole file and emit a tuple for each line): public void nextTuple() { // do other stuff here // reader might be BufferedReader that is initialized in open() String str; while((str = reader.readLine()) != null) { _collector.emit(new Values(str)); } // do some more stuff here } This code seems to be straight forward, however, I was told that one should not loop inside nextTuple() . The question is why? When a Spout is executed it runs in a single thread. This thread loops "forever" and has multiple

Found multiple defaults.yaml resources

阅读更多关于 Found multiple defaults.yaml resources

问题 when i tried to submit the topology i found this Exception in thread "main" java.lang.RuntimeException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. at backtype.storm.utils.Utils.findAndReadConfigFile(Utils.java:115) at backtype.storm.utils.Utils.readDefaultConfig(Utils.java:135) at backtype.storm.utils.Utils.readStormConfig(Utils.java:155) at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:61) at backtype.storm

How to call a particular method before killing a storm topology

阅读更多关于 How to call a particular method before killing a storm topology

问题 How to call a particular method before killing a storm topology. I have created a topology in storm, I wanted to call particular method, just before topology gets killed. is there any predefined overridden or any method available to do this in storm framework. Thanks in advance:) 回答1: There is no such thing... As a workaround, you can deactivate the topology before killing it. This ensures, that Spout.deactivate() is called. If you need to call a method at bolts, use Spout.deactivate() to

What is the “task” in Storm parallelism

阅读更多关于 What is the “task” in Storm parallelism

I'm trying to learn twitter storm by following the great article " Understanding the parallelism of a Storm topology " However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ? Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ? I think having multiple tasks in

What is/are the main difference(s) between Flink and Storm?

阅读更多关于 What is/are the main difference(s) between Flink and Storm?

Flink has been compared to Spark , which, as I see it, is the wrong comparison because it compares a windowed event processing system against micro-batching; Similarly, it does not make that much sense to me to compare Flink to Samza. In both cases it compares a real-time vs. a batched event processing strategy, even if at a smaller "scale" in the case of Samza. But I would like to know how Flink compares to Storm, which seems conceptually much more similar to it. I have found this (Slide #4) documenting the main difference as "adjustable latency" for Flink. Another hint seems to be an article