问题

TL;DR;

Is there a way to automatically adjust delay between elements in Project Reactor based on downstream health?

More details

I have an application that reads records from Kafka topic, sends an HTTP request for each one of them and writes the result to another Kafka topic. Reading and writing from/to Kafka is fast and easy, but the third party HTTP service is easily overwhelmed, so I use delayElements() with a value from a property file, which means that this value does not change during application runtime. Here's a code sample:

kafkaReceiver.receiveAutoAck()
            .concatMap(identity())
            .delayElements(ofMillis(delayElement))
            .flatMap(message -> recordProcessingFunction.process(message.value()), messageRate)
            .onErrorContinue(handleError())
            .map(this::getSenderRecord)
            .flatMap(kafkaSender::send)

However, the third party service might perform differently overtime and I'd like to be able to adjust this delay accordingly. Let's say, if I see that over 5% of requests fail over 10 second period, I would increase the delay. If it gets lower than 5% for over 10 sec, then I would reduce the delay again.

Is there an existing mechanism for that in Reactor? I can think of some creative solutions from my side, but was wondering if they (or someone else) already implemented that.

回答1:

I don't think there is backpressure provided by any HTTP client, including netty. One option is to switch to RSocket, but if you are calling a third-party service, that may not be an option, I guess. You could tune a rate that works during most of the day and send the errored out message to another topic using doOnError or similar. Another receiver can process those messages with even higher delays, put the message back on the same topic with a retry count if it errors out again so that you can finally stop processing them.

回答2:

If you are looking for delaying elements depends on the elements processing speed, you could use delayUntil.

  Flux.range(1, 100)
      .doOnNext(i -> System.out.println("Kafka Receive :: " + i))
      .delayUntil(i -> Mono.fromSupplier(() -> i)
                            .map(k -> {
                                // msg processing
                                return k * 2;
                            })
                            .delayElement(Duration.ofSeconds(1)) // msg processing simulation
                            .doOnNext(k -> System.out.println("Kafka send :: " + k)))
      .subscribe();

回答3:

You can add a retry with exponential backoff. Somethign like this:

influx()
.flatMap(x -> Mono.just(x)
    .map(data -> apiCall(data))
    .retryWhen(
            Retry.backoff(Integet.MAX_VALUE, Duration.ofSeconds(30))
                .filter(err -> err instanceof RuntimeException)
                .doBeforeRetry(
                    s -> log.warn("Retrying for err {}", s.failure().getMessage()))
                .onRetryExhaustedThrow((spec, sig) -> new RuntimeException("ex")))
                .onErrorResume(err -> Mono.empty()),
        concurrency_val,
        prefetch_val)

This will retry the failed request Integet.MAX_VALUE times with minimum time of 30s between each retry. The subsequent retries are actually offset by a configurable jitter factor (default value = 0.5) causing the duration to increase between successive retries.

The documentation on Retry.backoff says that:

A RetryBackoffSpec preconfigured for exponential backoff strategy with jitter, given a maximum number of retry attempts and a minimum Duration for the backoff.

Also, since the whole operation is mapped in flatMap, you can vary the default concurrency and prefetch values for it in order to account for the maximum number of requests that can fail at any given time while the whole pipeline waits for the RetryBackOffSpec to complete successfully.

Worst case scenario, your concurrency_val number of requests have failed and waiting for 30+ seconds for the retry to happen. The whole operation might halt down (still waiting for success from downstream) which may not be desirable if the downstream system don't recover in time. Better to replace backOff limit from Integer.MAX_VALUE to something managable beyond which it would just log the error and proceed with next event.

来源：https://stackoverflow.com/questions/62965034/automatic-rate-adjustment-in-reactor

标签

java