Combining groupBy and flatMap(maxConcurrent, …) in RxJava/RxScala

问题

I have incoming processing requests, of which I want do not want too many processing concurrently due to depleting shared resources. I also would prefer requests which share some unique key to not be executed concurrently:

def process(request: Request): Observable[Answer] = ???

requestsStream
  .groupBy(request => request.key)
  .flatMap(maxConcurrentProcessing, { case (key, requestsForKey) => 
      requestsForKey
         .flatMap(1, process)
  })

However, the above doesn't work because the observable per key never completes. What is the correct way to achieve this?

What doesn't work:

  .flatMap(maxConcurrentProcessing, { case (key, requestsForKey) => 
      // Take(1) unsubscribes after the first, causing groupBy to create a new observable, causing the next request to execute concurrently
      requestsForKey.take(1)
         .flatMap(1, process)
  })

 .flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
      // The idea was to unsubscribe after 100 milliseconds to "free up" maxConcurrentProcessing
      // This discards all requests after the first if processing takes more than 100 milliseconds
      requestsForKey.timeout(100.millis, Observable.empty)
         .flatMap(1, process)
  })

回答1:

Here's how I managed to achieve this. For each unique key I am assigning dedicated single thread scheduler (so that messages with the same key are processed in order):

@Test
public void groupBy() throws InterruptedException {
    final int NUM_GROUPS = 10;
    Observable.interval(1, TimeUnit.MILLISECONDS)
            .map(v -> {
                logger.info("received {}", v);
                return v;
            })
            .groupBy(v -> v % NUM_GROUPS)
            .flatMap(grouped -> {
                long key = grouped.getKey();
                logger.info("selecting scheduler for key {}", key);
                return grouped
                        .observeOn(assignScheduler(key))
                        .map(v -> {
                            String threadName = Thread.currentThread().getName();
                            Assert.assertEquals("proc-" + key, threadName);
                            logger.info("processing {} on {}", v, threadName);
                            return v;
                        })
                        .observeOn(Schedulers.single()); // re-schedule
            })
            .subscribe(v -> logger.info("got {}", v));

    Thread.sleep(1000);
}

In my case the number of keys (NUM_GROUPS) is small so I create dedicated scheduler for each key:

Scheduler assignScheduler(long key) {
    return Schedulers.from(Executors.newSingleThreadExecutor(
        r -> new Thread(r, "proc-" + key)));
}

In case the number of keys is infinite or too large to dedicate a thread for each one, you can create a pool of schedulers and reuse them like this:

Scheduler assignScheduler(long key) {
    // assign randomly
    return poolOfSchedulers[random.nextInt(SIZE_OF_POOL)];
}

来源：https://stackoverflow.com/questions/41693044/combining-groupby-and-flatmapmaxconcurrent-in-rxjava-rxscala

标签

scala

rx-java

reactive-programming

rx-scala