storm topology: one to many (random)

最后都变了- 提交于 2019-12-23 03:16:12

问题


I'm using the KafkaSpout spout to read from all (6) partitions on a kafka topic. The first bolt in the topology has to convert the byte stream into a struct (via IDL definition), lookup a value in a db and pass these values to a second bolt which writes it all into cassandra.

There are several issues occurring:

  1. Many fail(s) from the kafka spout.
  2. The first bolt reports "capacity" of > 2.0 from the storm ui.

I've tried to increase the parallelism but it appears that storm will only accept 1:1 from the kafkaspout to the first bolt. I'm guessing that #1 is a result of timeouts from the first bolt.

What I want to do: have the kafkaspouts (limited to 1 / kafka partition) able to send their bits to a random first bolt so that I can run many more of these than the # of spouts. The first and second bolts would be 1:1 but the spout to first bolt should be 1:many.

Currently I'm using the LocalOrShuffleGrouping to connect between spout->bolt->bolt.


Edit:

(Re)reading the storms docs I see this passage:

Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

Yet when I look at the load on the executors for my first bolt I see everything concentrated on 6 of them - seemingly ignoring the other 24.

I'm missing some large clue here.

来源:https://stackoverflow.com/questions/30831202/storm-topology-one-to-many-random

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!