apache-storm

What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster

核能气质少年 提交于 2019-11-27 19:45:12
I'm using Storm 0.8.1 to read incoming messages off an Amazon SQS queue and am getting consistent exceptions when doing so: 2013-12-02 02:21:38 executor [ERROR] java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.) at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219) at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88) at backtype.storm.daemon

How can I serialize a numpy array while preserving matrix dimensions?

Deadly 提交于 2019-11-27 18:08:49
numpy.array.tostring doesn't seem to preserve information about matrix dimensions (see this question ), requiring the user to issue a call to numpy.array.reshape . Is there a way to serialize a numpy array to JSON format while preserving this information? Note: The arrays may contain ints, floats or bools. It's reasonable to expect a transposed array. Note 2: this is being done with the intent of passing the numpy array through a Storm topology using streamparse, in case such information ends up being relevant. user2357112 pickle.dumps or numpy.save encode all the information needed to

How to programmatically kill a storm topology?

廉价感情. 提交于 2019-11-27 16:33:50
问题 I am using a java class to submit a topology to the storm cluster and I also plan to use java class to kill the topology. But as per storm documentation, the following command is used to kill a topology and there is no Java method (and this has valid reasons) storm kill {stormname} So is it fine to call a shell script from java class to kill the topology? What are the other ways to kill topology? Also, how to get the status of running topologies in storm cluster? 回答1: For killing topology you

How to use apache storm tuple

こ雲淡風輕ζ 提交于 2019-11-27 14:16:51
问题 I just began with Apache Storm. I read the tutorial and had a look into examples My problem is that all example work with very simple tuples (often one filed with a string). The tuples are created inline (using new Values(...)). In my case i have tuples with many fields (5..100). So my question is how to implement such tuple with name and type (all primitive) for each field? Are there any examples? (i think directly implementing "Tuple" isn't a good idea) thanks 回答1: An alternative to

Setting up a docker / fig Mesos environment

[亡魂溺海] 提交于 2019-11-27 11:34:29
问题 I'm trying to set up a docker / fig Mesos cluster. I'm new to fig and Docker. Docker has plenty of documentation, but I find myself struggling to understand how to work with fig. Here's my fig.yaml at the moment: zookeeper: image: jplock/zookeeper ports: - "49181:2181" mesosMaster: image: mesosphere/mesos:0.19.1 ports: - "15050:5050" links: - zookeeper:zk command: mesos-master --zk=zk --work_dir=/var/log --quorum=1 mesosSlave: image: mesosphere/mesos:0.19.1 links: - zookeeper:zk command:

java.lang.NoSuchFieldError: INSTANCE

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 05:23:49
When trying to submit my topology through StormSubmitter, I am getting - Caused by: java.lang.NoSuchFieldError: INSTANCE at org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:52) I am using Spring. I am not initializing HttpClient in Spout/Bolt Constructor. Instead its initialized in constructor of a class that is being fetched from Spring Context in prepare() method of bolt Code is structured as follows - SomeBolt.java @Component public class SomeBolt extends BaseRichBolt { private OutputCollector _collector; private SomeClient someClient;

What is/are the main difference(s) between Flink and Storm?

我只是一个虾纸丫 提交于 2019-11-27 04:56:56
问题 Flink has been compared to Spark, which, as I see it, is the wrong comparison because it compares a windowed event processing system against micro-batching; Similarly, it does not make that much sense to me to compare Flink to Samza. In both cases it compares a real-time vs. a batched event processing strategy, even if at a smaller "scale" in the case of Samza. But I would like to know how Flink compares to Storm, which seems conceptually much more similar to it. I have found this (Slide #4)

How would I split a stream in Apache Storm?

混江龙づ霸主 提交于 2019-11-27 00:54:25
问题 I am not understanding how I would split a stream in Apache Storm. For example, I have bolt A that after some computation has somevalue1, somevalue2, and somevalue3. It wants to send somevalue1 to bolt B, somevalue2 to bolt C, and somevalue1,somevalue2 to bolt D. How would I do this in Storm? What grouping would I use and what would my topology look like? Thank you in advance for your help. 回答1: You can use different streams if your case needs that, it is not really splitting, but you will

What is the “task” in Storm parallelism

我的未来我决定 提交于 2019-11-26 23:57:27
问题 I'm trying to learn twitter storm by following the great article "Understanding the parallelism of a Storm topology" However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ? Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is

exception after submitting topology

放肆的年华 提交于 2019-11-26 23:40:27
问题 I'm new in storm and trying to submit a topology and found this in supervisor I found this in log file of workers [ERROR] Async loop died! java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused at backtype.storm.drpc.DRPCInvocationsClient.<init>(DRPCInvocationsClient.java:23) at backtype.storm.drpc.DRPCSpout.open(DRPCSpout.java:69) at storm.trident.spout.RichSpoutBatchTriggerer.open(RichSpoutBatchTriggerer.java:41) at