问题
Spring's reactor has an interesting feature : Hedging . It means spawning many requests and get the first returned result , and automatically clean other contexts. Josh Long recently has been actively promoting this feature. Googling Spring reactor hedging shows relative results. If anybody is curious , here is the sample code . In short , Flux.first()
simplifies all the underlaying hassles , which is very impressive.
I wonder how this can be achieved with Kotlin's coroutine and multithread , (and maybe with Flow
or Channel
) . I thought of a simple scenario : One service accepts longUrl and spawns the longUrl to many URL shorten service ( such as IsGd , TinyUrl ...) , and returns the first returned URL ... (and terminates / cleans other thread / coroutine resources)
There is an interface UrlShorter
that defines this work :
interface UrlShorter {
fun getShortUrl(longUrl: String): String?
}
And there are three implementations , one for is.gd , another for tinyUrl , and the third is a Dumb implementation that blocks 10 seconds and return null :
class IsgdImpl : UrlShorter {
override fun getShortUrl(longUrl: String): String? {
logger.info("running : {}", Thread.currentThread().name)
// isGd api url blocked by SO , it sucks . see the underlaying gist for full code
val url = "https://is.gd/_create.php?format=simple&url=%s".format(URLEncoder.encode(longUrl, "UTF-8"))
return Request.Get(url).execute().returnContent().asString().also {
logger.info("returning {}", it)
}
}
}
class TinyImpl : UrlShorter {
override fun getShortUrl(longUrl: String): String? {
logger.info("running : {}", Thread.currentThread().name)
val url = "http://tinyurl.com/_api-create.php?url=$longUrl" // sorry the URL is blocked by stackoverflow , see the underlaying gist for full code
return Request.Get(url).execute().returnContent().asString().also {
logger.info("returning {}", it)
}
}
}
class DumbImpl : UrlShorter {
override fun getShortUrl(longUrl: String): String? {
logger.info("running : {}", Thread.currentThread().name)
TimeUnit.SECONDS.sleep(10)
return null
}
}
And there is a UrlShorterService
that takes all the UrlShorter
implementations , and try to spawn coroutines and get the first result .
Here is what I've thought of :
@ExperimentalCoroutinesApi
@FlowPreview
class UrlShorterService(private val impls: List<UrlShorter>) {
private val es: ExecutorService = Executors.newFixedThreadPool(impls.size)
private val esDispatcher = es.asCoroutineDispatcher()
suspend fun getShortUrl(longUrl: String): String {
return method1(longUrl) // there are other methods , with different ways...
}
private inline fun <T, R : Any> Iterable<T>.firstNotNullResult(transform: (T) -> R?): R? {
for (element in this) {
val result = transform(element)
if (result != null) return result
}
return null
}
The client side is simple too :
@ExperimentalCoroutinesApi
@FlowPreview
class UrlShorterServiceTest {
@Test
fun testHedging() {
val impls = listOf(DumbImpl(), IsgdImpl(), TinyImpl()) // Dumb first
val service = UrlShorterService(impls)
runBlocking {
service.getShortUrl("https://www.google.com").also {
logger.info("result = {}", it)
}
}
}
}
Notice I put the DumbImpl
first , because I hope it may spawn first and blocking in its thread. And other two implementations can get result.
OK , here is the problem , how to achieve hedging in kotlin ? I try the following methods :
private suspend fun method1(longUrl: String): String {
return impls.asSequence().asFlow().flatMapMerge(impls.size) { impl ->
flow {
impl.getShortUrl(longUrl)?.also {
emit(it)
}
}.flowOn(esDispatcher)
}.first()
.also { esDispatcher.cancelChildren() } // doesn't impact the result
}
I hope method1
should work , but it totally executes 10 seconds :
00:56:09,253 INFO TinyImpl - running : pool-1-thread-3
00:56:09,254 INFO DumbImpl - running : pool-1-thread-1
00:56:09,253 INFO IsgdImpl - running : pool-1-thread-2
00:56:11,150 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
00:56:13,604 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
00:56:19,261 INFO UrlShorterServiceTest$testHedging$1 - result = // tiny url blocked by SO , it sucks
Then , I thought other method2 , method3 , method4 , method5 ... but all not work :
/**
* 00:54:29,035 INFO IsgdImpl - running : pool-1-thread-3
* 00:54:29,036 INFO DumbImpl - running : pool-1-thread-2
* 00:54:29,035 INFO TinyImpl - running : pool-1-thread-1
* 00:54:30,228 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 00:54:30,797 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
* 00:54:39,046 INFO UrlShorterServiceTest$testHedging$1 - result = // idGd url blocked by SO , it sucks
*/
private suspend fun method2(longUrl: String): String {
return withContext(esDispatcher) {
impls.map { impl ->
async(esDispatcher) {
impl.getShortUrl(longUrl)
}
}.firstNotNullResult { it.await() } ?: longUrl
}
}
/**
* 00:52:30,681 INFO IsgdImpl - running : pool-1-thread-2
* 00:52:30,682 INFO DumbImpl - running : pool-1-thread-1
* 00:52:30,681 INFO TinyImpl - running : pool-1-thread-3
* 00:52:31,838 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 00:52:33,721 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
* 00:52:40,691 INFO UrlShorterServiceTest$testHedging$1 - result = // idGd url blocked by SO , it sucks
*/
private suspend fun method3(longUrl: String): String {
return coroutineScope {
impls.map { impl ->
async(esDispatcher) {
impl.getShortUrl(longUrl)
}
}.firstNotNullResult { it.await() } ?: longUrl
}
}
/**
* 01:58:56,930 INFO TinyImpl - running : pool-1-thread-1
* 01:58:56,933 INFO DumbImpl - running : pool-1-thread-2
* 01:58:56,930 INFO IsgdImpl - running : pool-1-thread-3
* 01:58:58,411 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 01:58:59,026 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
* 01:59:06,942 INFO UrlShorterServiceTest$testHedging$1 - result = // idGd url blocked by SO , it sucks
*/
private suspend fun method4(longUrl: String): String {
return withContext(esDispatcher) {
impls.map { impl ->
async {
impl.getShortUrl(longUrl)
}
}.firstNotNullResult { it.await() } ?: longUrl
}
}
I am not familiar with Channel
, sorry for the exception ↓
/**
* 01:29:44,460 INFO UrlShorterService$method5$2 - channel closed
* 01:29:44,461 INFO DumbImpl - running : pool-1-thread-2
* 01:29:44,460 INFO IsgdImpl - running : pool-1-thread-3
* 01:29:44,466 INFO TinyImpl - running : pool-1-thread-1
* 01:29:45,765 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 01:29:46,339 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
*
* kotlinx.coroutines.channels.ClosedSendChannelException: Channel was closed
*
*/
private suspend fun method5(longUrl: String): String {
val channel = Channel<String>()
withContext(esDispatcher) {
impls.forEach { impl ->
launch {
impl.getShortUrl(longUrl)?.also {
channel.send(it)
}
}
}
channel.close()
logger.info("channel closed")
}
return channel.consumeAsFlow().first()
}
OK , I don't know if there are any other ways ... but all above are not working... All blocks at least 10 seconds ( blocked by DumbImpl
) .
The whole source code can be found on github gist .
How can hedging be achieved in kotlin ? By Deferred
or Flow
or Channel
or any other better ideas ? Thank you.
After submitting the question , I found all tinyurl , isGd url are blocked by SO . It really sucks !
回答1:
If the actual work you want to do in parallel consists of network fetches, you should choose an async networking library so you can properly use non-blocking coroutines with it. For example, as of version 11 the JDK provides an async HTTP client which you can use as follows:
val httpClient: HttpClient = HttpClient.newHttpClient()
suspend fun httpGet(url: String): String = httpClient
.sendAsync(
HttpRequest.newBuilder().uri(URI.create(url)).build(),
BodyHandlers.ofString())
.await()
.body()
Here's a function that accomplishes request hedging given a suspendable implementation like above:
class UrlShortenerService(
private val impls: List<UrlShortener>
) {
suspend fun getShortUrl(longUrl: String): String? = impls
.asFlow()
.flatMapMerge(impls.size) { impl ->
flow<String?> {
try {
impl.getShortUrl(longUrl)?.also { emit(it) }
}
catch (e: Exception) {
// maybe log it, but don't let it propagate
}
}
}
.onCompletion { emit(null) }
.first()
}
Note the absence of any custom dispatchers, you don't need them for suspendable work. Any dispatcher will do, and all the work can run in a single thread.
The onCompletion
parts steps into action when your all URL shorteners fail. In that case the flatMapMerge
stage doesn't emit anything and first()
would deadlock without the extra null
injected into the flow.
To test it I used the following code:
class Shortener(
private val delay: Long
) : UrlShortener {
override suspend fun getShortUrl(longUrl: String): String? {
delay(delay * 1000)
println("Shortener $delay completing")
if (delay == 1L) {
throw Exception("failed service")
}
if (delay == 2L) {
return null
}
return "shortened after $delay seconds"
}
}
suspend fun main() {
val shorteners = listOf(
Shortener(4),
Shortener(3),
Shortener(2),
Shortener(1)
)
measureTimeMillis {
UrlShortenerService(shorteners).getShortUrl("bla").also {
println(it)
}
}.also {
println("Took $it ms")
}
}
This exercises the various failure cases like returning null or failing with an exception. For this code I get the following output:
Shortener 1 completing
Shortener 2 completing
Shortener 3 completing
shortened after 3 seconds
Took 3080 ms
We can see that the shorteners 1 and 2 completed but with a failure, shortener 3 returned a valid response, and shortener 4 was cancelled before completing. I think this matches the requirements.
If you can't move away from blocking requests, your implementation will have to start num_impls * num_concurrent_requests
threads, which is not great. However, if that's the best you can have, here's an implementation that hedges blocking requests but awaits on them suspendably and cancellably. It will send an interrupt signal to the worker threads running the requests, but if your library's IO code is non-interruptible, these threads will hang waiting for their requests to complete or time out.
val es = Executors.newCachedThreadPool()
interface UrlShortener {
fun getShortUrl(longUrl: String): String? // not suspendable!
}
class UrlShortenerService(
private val impls: List<UrlShortener>
) {
suspend fun getShortUrl(longUrl: String): String {
val chan = Channel<String?>()
val futures = impls.map { impl -> es.submit {
try {
impl.getShortUrl(longUrl)
} catch (e: Exception) {
null
}.also { runBlocking { chan.send(it) } }
} }
try {
(1..impls.size).forEach { _ ->
chan.receive()?.also { return it }
}
throw Exception("All services failed")
} finally {
chan.close()
futures.forEach { it.cancel(true) }
}
}
}
回答2:
This is essentially what the select
APi was designed to do:
coroutineScope {
select {
impls.forEach { impl ->
async {
impl.getShortUrl(longUrl)
}.onAwait { it }
}
}
coroutineContext[Job].cancelChildren() // Cancel any requests that are still going.
}
Note that this will not handle exceptions thrown by the service implementations, you will need to use a supervisorScope
with a custom exception handler and a filtering select loop if you want to actually handle those.
来源:https://stackoverflow.com/questions/58736419/kotlin-to-achieve-multithread-request-hedging