We’re using Storm with the Kafka Spout. When we fail messages, we’d like to replay them, but in some cases bad data or code errors will cause messages to always fail a Bolt,
We simply had our bolt emit the bad tuple on an error stream and acked it. Another bolt handled the error by writing it back to a Kafka topic specifically for errors. This allows us to easily direct normal vs. error data flow through the topology.
The only case where we fail a tuple is because some required resource is offline, such as a network connection, DB, ... These are retriable errors. Anything else is directed to the error stream to be fixed or handled as is appropriate.
This all assumes of course, that you don't want to incur any data loss. If you only want to attempt a best effort and ignore after a few retries, then I would look at other options.