apache-storm

Communication Between Several Storm Topologies

陌路散爱 提交于 2019-12-11 02:27:09
问题 I am trying to deploy several Storm topologies in production. I checked the documentation but couldn't find any references on whether it is possible for the topologies to communicate via native methods. Does anyone have any suggestions on how this could be implemented? In short, I am interested to see if the it's possible for tuples to be sent across topologies. Thanks for your help! 回答1: Theoretically, you could probably make it happen. Practically, no. If you want to communicate via tuples,

Apache Storm: Nimbus not starting on Port 6627

隐身守侯 提交于 2019-12-11 02:18:50
问题 I can't see anything on port 6627 after starting Nimbus. I am getting the Connection Refused error. Following errors are thrown in Nimbus Log: 6899 [main] ERROR com.smarterme.intake.EmbeddedTopologyRunner - Toplogy submitting failed.....org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused at backtype.storm.utils.NimbusClient

Total number of non repeated words in each tweet

跟風遠走 提交于 2019-12-11 01:29:01
问题 I'm new to java and Trident , I imported project for getting tweets but i want to get something How this code get more than one tweet as i got form the code that tuple.getValue(0); means first tweet only ?! Problem with me to get all tweets in hashset or hashmap to get total number of distnictive words in each tweet public void execute(TridentTuple tuple, TridentCollector collector) { this method is used to execute equations on tweet public Values getValues(Tweet tweet, String[] words){ }

How can I profile Apache Storm topologies without using the web dashboard?

你。 提交于 2019-12-10 22:09:35
问题 The title pretty much says it all, I have some Storm topologies and I'd like to measure their latencies, that is, the amount of time between a message coming in from Kafka and the last bit of related execution in the final bolt. Bonus points if I can drill into the results to see the latency across each bolt. Can this be done by simply tweaking the Storm configuration? If not, is http://storm.incubator.apache.org/apidocs/backtype/storm/hooks/info/SpoutAckInfo.html backtype.storm.hooks.info

NotSerializableException org.neo4j.kernel.EmbeddedGraphDatabase

泄露秘密 提交于 2019-12-10 20:36:15
问题 I am working with neo4j to create graph, taking data from mongodb as document. Standalone code is working fine without storm. But while integrating it with storm, I am getting - java.io.NotSerializableException: org.neo4j.kernel.EmbeddedGraphDatabase exception. Dont know the exact reason why i am getting this. If anybody faced such issue please let me know how to resolve it. 回答1: Because you are trying to pass an object to the serializer that does not implement Serializable interface. 回答2:

How to reset Kafka offsets to match tail position?

风流意气都作罢 提交于 2019-12-10 17:54:10
问题 We're using Storm with Kafka and ZooKeeper. We had a situation where we had to delete some topics and recreate them with different names. Our Kafka spouts stayed the same, aside from now reading from the new topic names. However now the spouts are using the offsets from the old topic partitions when trying to read from the new topics. So the tail position of my-topic-name partition 0 will be 500 but the offset will be something like 10000. Is there a way to reset the offset position so it

Using tick tuples with trident in storm

痞子三分冷 提交于 2019-12-10 17:09:33
问题 I am able to use standard spout,bolt combination to do streaming aggregation and works very well in happy case, when using tick tuples to persist data at some interval to make use of batching. Right now i am doing some failure management (tracking off tuples not saved etc) myself.(i.e not ootb from storm) But i have read that trident gives you a higher abstraction and better failure management. What i dont understand is whether there is tick tuple support in trident. Basically I would like to

Delayed Queue implementation in Storm – Kafka, Cassandra, Redis or Beanstalk?

时光怂恿深爱的人放手 提交于 2019-12-10 15:15:29
问题 I have a storm topology to process messages from Kafka and make HTTP call / saves in Cassandra based on the task in hand. I process the messages as soon as they come. How ever few messages are not processed completely due to the response form external sources such as an HTTP. I would like to implement a exponential backoff mechanism for retrial in-case HTTP server does not respond/returns an error message to retry after some time. I could think of few ideas using which I could achieve them. I

Is there an alternative to Twitter Storm that is written in Python? [closed]

梦想的初衷 提交于 2019-12-10 14:16:11
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I couldn't find much after various searches, for an alternative to Twitter Storm. Specifically a streaming big data processing library

Processing records in order in Storm

时光毁灭记忆、已成空白 提交于 2019-12-10 12:22:58
问题 I'm new to Storm and I'm having problems to figure out how to process records in order. I have a dataset which contains records with the following fields: user_id, location_id, time_of_checking Now, I would like to identify users which have fulfilled the path I specified (for example, users that went from location A to location B to location C). I'm using Kafka producer and reading this records from a file to simulate live data. Data is sorted by date. So, to check if my pattern is fulfilled