apache-storm | 易学教程

How to Set spoutconfig from default setting?

阅读更多关于 How to Set spoutconfig from default setting?

问题 I'm Trying to get the fb pages data using graph api. The size each post is more than 1MB where kafka default fetch.message is 1MB. I have changed the kafka properties from 1MB to 3MB by adding the below lines in kafa consumer.properties and server.properties file. fetch.message.max.bytes=3048576 (consumer.properties) file message.max.bytes=3048576 (server.properties) replica.fetch.max.bytes=3048576 (server.properties ) Now after adding the above lines in Kafka, 3MB message data is going into

Apache Storm: storm-core build failure

阅读更多关于 Apache Storm: storm-core build failure

问题 When trying to build Apache Storm downloaded from Git repository I get storm-core (or maybe storm-hive ?) build error: [INFO] Reactor Summary: [INFO] [INFO] Storm .............................................. SUCCESS [ 1.366 s] [INFO] multilang-javascript ............................... SUCCESS [ 0.801 s] [INFO] multilang-python ................................... SUCCESS [ 0.138 s] [INFO] multilang-ruby ..................................... SUCCESS [ 0.111 s] [INFO] maven-shade-clojure

Storm: Is it possible to limit the number of replays on fail (Anchoring)?

阅读更多关于 Storm: Is it possible to limit the number of replays on fail (Anchoring)?

问题 Is there an option to limit the number of replays when using anchoring? I have a tuple that should parse json object, in case it gets an exception I prefer it will replay two more times. I tried to track the number of times storm is replaying with prints, but each time I entered non parse-able value the counter showed different result. catch{ collector.fail(tuple) } 回答1: The fail method in the BaseRichSpout class is empty . meaning you are supposed to override the same method to handle the

Run StormCrawler in local mode or install Apache Storm?

阅读更多关于 Run StormCrawler in local mode or install Apache Storm?

问题 So I'm trying to figure out how to install and setup Storm/Stormcrawler with ES and Kibana as described here. I never installed Storm on my local machine because I've worked with Nutch before and I never had to install Hadoop locally... thought it might be the same with Storm(maybe not?). I'd like to start crawling with Stormcrawler instead of Nutch now. It seems that if I just download a release and add the /bin to my PATH, I can only talk to a remote cluster. It seems like I need to setup a

Error with zookeeper & storm

阅读更多关于 Error with zookeeper & storm

问题 I am developing a code for Storm, which is an example of the developers. My problem is that when you run this code from IDE Eclipse, do not get the connection between Storm and Zookeeper is established. Zookeeper is running in 2181 and is also set in storm.yaml. My exception is: 72992 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN o.a.s.s.o.a.z.s.NIOServerCnxn - caught end of stream exception org.apache.storm.shade.org.apache.zookeeper.server.ServerCnxn$EndOfStreamException: Unable to read

My Storm Topology neither working(not generating output) nor failing (not generating errors or exceptions)

阅读更多关于 My Storm Topology neither working(not generating output) nor failing (not generating errors or exceptions)

问题 I have a topology in which I am trying to count word occurrences which are being generated by SimulatorSpout (not real Stream) and after that write to MySQL database table, the table scheme is very simple: Field | Type | ... ID | int(11) | Auto_icr word | varchar(50) | count | int(11) | But I am facing weird problem(as I beforementioned) I successfully submitted The Topology to my Storm Cluster which consists of 4 supervisors, and I can see the flow of the Topology in Storm Web UI (no

Storm [ERROR] Async loop died

阅读更多关于 Storm [ERROR] Async loop died

问题 I am using storm 0.9.3. I am running a topology in python, in which spouts read a URL off the Kafka queue and then pass it to a next bolt which fetches that page using python requests module. Here is my topology definition in java. public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("URLSpout", new URLSpout(), 1); builder.setBolt("ScrapeBolt", new ScrapeBolt(), 30).shuffleGrouping("URLSpout"); Config conf = new Config();

apache storm/spark and data visualisation tool(s)

阅读更多关于 apache storm/spark and data visualisation tool(s)

问题 I have been searching for hours bu i did not find a clear answer. I would like to know what it is the most suitable data visualization tool(s) to use with apache storm/spark. I know there is tableau and jaspersoft but they are not free. Furthermore, there is the possibility of elasticsearch and kibana but I would like to find/try something else. So, do you have an idea please ?! Thanks a lot for your attention. 回答1: You are not giving much info here. Storm is stream processing engine, Spark

Storm - Conditionally consuming stream from kafka spout?

阅读更多关于 Storm - Conditionally consuming stream from kafka spout?

问题 I have a scenario where I am posting json to a Kafka instance. I am then using a Kafka Spout to emit the stream to a bolt. But now I would like to add additional field (call it x ) to my json message. If x is a I would like it to be consumed by boltA, if x is b I would like it to be consumed by boltB. Is there a way to direct the stream to the proper bolt depending on the streams contents? 回答1: The simplest way should be to add a SplitBolt that consumes from KafkaSpout , evaluates the field x

Apache Storm Remote Topology Submission

阅读更多关于 Apache Storm Remote Topology Submission

问题 I have been testing remote submission of Storm Topologies using IDE (Eclipse). And I succeeded uploading simple storm topology to remote Storm cluster, but the weird thing is when I checked Storm UI to make sure whether the topology, which was submitted remotely, is working without problems, I saw just _acker bolt in the UI but other bolts and spout is not there. After that I submitted the topology manually from command line and again checked Storm UI, and it is working as it is supposed to