Others have provided answers based on Kafka's documentation but sometimes product documentation should be taken with a grain of salt as an absolute technical reference. For example:
- Numerous push-based messaging systems support consumption at
different rates, usually through their session management primitives.
You establish/resume an active application layer session when you
want to consume and suspend the session (e.g. by simply not
responding for less than the keepalive window and greater than the in-flight windows...or with an explicit message) when you want to
stop/pause. MQTT and AMQP, for example both provide this capability
(in MQTT's case, since the late 90's). Given that no actions are
required to pause consumption (by definition), and less traffic is
required under steady stable state (no request), it is difficult to
see how Kafka's pull-based model is more efficient.
- One critical advantage push messaging has vs. pull messaging is that
there is no request traffic to scale as the number of potentially
active topics increases. If you have a million potentially active
topics, you have to issue queries for all those topics. This
concern becomes especially relevant at scale.
- The critical advantage pull messaging has vs push messaging is replayability. This factors a great deal into whether downstream systems can offer guarantees around processing (e.g. they might fail before doing so and have to restart or e.g. fail to write messages recoverably).
- Another critical advantage for pull messaging vs push messaging is buffer allocation. A consuming process can explicitly request as much data as they can accommodate in a pre-allocated buffer, rather than having to allocate buffers over and over again. This gains back some of the goodput losses vs push messaging from query scaling (but not much). The impact here is measurable, however, if your message sizes vary wildly (e.g. a few KB->a few hundred MB).
- It is a fallacy to suggest that pull messaging has structural scalability advantages over push messaging. Partitioning is what is usually used to provide scale in messaging applications, regardless of the consumption model. There are push messaging systems operating well in excess of 300M msgs/sec on hard wired local clusters...125K msgs/sec doesn't even buy admission to the show. In fact, pull messaging has inferior goodput by definition and systems like Kafka usually end up with more hardware to reach the same performance level. The benefits noted above may often make it worth the cost. I am unaware of anyone using Kafka for messaging in high frequency trading, for example, where microseconds matter.
It may be interesting to note that various push-pull messaging systems were developed in the late 1990s as a way to optimize the goodput. The results were never staggering and the system complexity and other factors often outweigh this kind of optimization. I believe this is Jay's point overall about practical performance over real data center networks, not to mention things like the open Internet.