apache-storm | 易学教程

is it possible to add Tasks dynamically at runtime in apache storm not just rebalance executors

阅读更多关于 is it possible to add Tasks dynamically at runtime in apache storm not just rebalance executors

问题 I need a functionality in storm that i know (based on the docs) has not been yet implemented. I need to add more tasks at runtime without the need to have an initial large number of tasks, because it might cause performance issues. because Running more than one task per executor does not increase the level of parallelism -- an executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor. I know that rebalance command can be used to add

Java client library for reading from Kestrel server Queue from within Storm Spout

阅读更多关于 Java client library for reading from Kestrel server Queue from within Storm Spout

问题 I've setup a Kestrel server and able to setup and use queues via the python pykestrel library. We have a scenario where a python client writes to Kestrel queue(s) and a Storm spout needs to read from the queue(s). I've tried using the storm-kestrel library but running into issues. Googling seems to suggest it doesn't support the memcache port (22133). I've added the maven bindings as provided here. Didn't use the KestrelThriftSpout spout, using Kestrel.Client. Compilation is fine but I get

Kafka consumer is very slow to consume data and only consuming first 500 records

阅读更多关于 Kafka consumer is very slow to consume data and only consuming first 500 records

问题 I am trying to integrate MongoDB and Storm-Kafka, Kafka Producer produces data from MongoDB but it fails to fetch all records from Consumer side. It only consuming 500-600 records out of 1 million records. There are no errors in log file, topology is still alive but not processing further records. Kafka version :0.10.* Storm version :1.2.1 Do i need to add any configs in Consumer? conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false); conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048); conf.put

How to E2E test functionality of Storm Topology by programmatically inserting messages

阅读更多关于 How to E2E test functionality of Storm Topology by programmatically inserting messages

问题 Our Apache Storm topology listens messages from Kafka using KafkaSpout and after doing lot of mapping/reducing/enrichment/aggregation etc. etc finally inserts data into Cassandra. There is another kafka input where we receive user queries for data if topology finds a response then it sends that onto a third kafka topic. Now we want to write E2E test using Junit in which we can directly programmatically insert data into topology and then by inserting user query message, we can assert on third

Streaming Data Processing and nano second time resolution

阅读更多关于 Streaming Data Processing and nano second time resolution

问题 I'm just starting into the topic of real-time stream data processing frameworks, and I have a question to which I as of yet could not find any conclusive answer: Do the usual suspects (Apache's Spark, Kafka, Storm, Flink etc.) support processing data with an event time resolution of nanoseconds (or even picoseconds)? Most people and documentation talk about a millisecond or microsecond resolution, but I was unable to find a definite answer if more resolution would be possible or a problem.

Storm 1.2.2 Supervisor also take localhost as nimbus and can't connect to it, although nimbus is in another server and already specified in storm.yaml

阅读更多关于 Storm 1.2.2 Supervisor also take localhost as nimbus and can't connect to it, although nimbus is in another server and already specified in storm.yaml

问题 problem is just as titled, no errors printed in supervisor logs after cluster is started. whenever a jar is submitted, error is reported in supervistors' log that fail to connect localhost nimbus. 1, here is my yaml: storm.zookeeper.servers: - "beta-hbase02" - "beta-hbase03" - "beta-hbase04" storm.zookeeper.root: "/storm" nibus.seeds: ["beta-hbase01"] storm.local.dir: "/var/lib/hadoop-hdfs/apache-storm/storm/data" supervisor.slots.ports: - 6800 - 6801 - 6802 - 6803 ui.port: 8686 storm.log.dir

What can be used as a test stub for CassandraWriterBolt?

阅读更多关于 What can be used as a test stub for CassandraWriterBolt?

问题 I read a json from Kafka, FieldExtractionBolt reads that json extracts data into tuple values and passes them to CassandraWriterBolt, which in its turn writes a record in Cassandra writing all those tuple values into separate columns. JSON message on Kafka - {"pair":"GBPJPY","bid":134.4563,"ask":134.4354} FieldExtractionBolt - String message = tuple.getStringByField("message"); Map values = new Gson().fromJson(message, Map.class); basicOutputCollector.emit(new Values(values.get("pair"),

storm kafka first message is skipped during restart and first start as well

阅读更多关于 storm kafka first message is skipped during restart and first start as well

问题 I have written a sample topology where, it will consume the message from kafka and log it. please find the code snippet below End to End topology is fine. When I post the message in Kafka Producer it's consumed properly. I simply get the message and log it in MessagePrinter. Issue described below use case 1: I have brought down the topology, sent messages 1-10, when I bring up the topology, message 2-10 is logged properly by topology and first message alone is not logged. use case 2: same

How to excute one bolt after the other when each bolt takes data from same spout?

阅读更多关于 How to excute one bolt after the other when each bolt takes data from same spout?

问题 I'm taking data from spout.Each bolt will insert mapped fields into different tables in my database.But my database tables have constraints.in my test tables I have two tables named user-details and my-details for which constraints allows users-table to fill first(first on should be inserted) after that only my-details table will be inserted.when I run the topology only users-table is getting inserted because when bolts perform the insert query to the database it is allowing only psqlbolt to

Storm WordCount error: Pipe to subprocess seems to be broken, no output read

阅读更多关于 Storm WordCount error: Pipe to subprocess seems to be broken, no output read

问题 Storm 0.10.0 my previous question (Apache storm : Could not load main class org.apache.storm.starter.ExclamationTopology) which was solved. hello I have a single node cluster up and running on my machine, the storm config file is as follows:(storm.yaml) storm.zookeeper.servers: # - "server1" # - "server2" - "localhost" storm.zookeeper.port: 2181 nimbus.host: "localhost" storm.local.dir: "/var/stormtmp" java.library.path: "/usr/local" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 worker