Can Haskell's Control.Concurrent.Async.mapConcurrently have a limit?

问题

I'm attempting to run multiple downloads in parallel in Haskell, which I would normally just use the Control.Concurrent.Async.mapConcurrently function for. However, doing so opens ~3000 connections, which causes the web server to reject them all. Is it possible to accomplish the same task as mapConcurrently, but only have a limited number of connections open at a time (i.e. only 2 or 4 at a time)?

回答1:

A quick solution would be to use a semaphore to restrict the number of concurrent actions. It's not optimal (all threads are created at once and then wait), but works:

import Control.Concurrent.MSem
import Control.Concurrent.Async
import Control.Concurrent (threadDelay)
import qualified Data.Traversable as T

mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
mapPool max f xs = do
    sem <- new max
    mapConcurrently (with sem . f) xs

-- A little test:
main = mapPool 10 (\x -> threadDelay 1000000 >> print x) [1..100]

回答2:

You may also try the pooled-io package where you can write:

import qualified Control.Concurrent.PooledIO.Final as Pool
import Control.DeepSeq (NFData)
import Data.Traversable (Traversable, traverse)

mapPool ::
   (Traversable t, NFData b) =>
   Int -> (a -> IO b) -> t a -> IO (t b)
mapPool n f = Pool.runLimited n . traverse (Pool.fork . f)

回答3:

This is really easy to do using the Control.Concurrent.Spawn library:

import Control.Concurrent.Spawn

type URL      = String
type Response = String    

numMaxConcurrentThreads = 4

getURLs :: [URL] -> IO [Response]
getURLs urlList = do
   wrap <- pool numMaxConcurrentThreads
   parMapIO (wrap . fetchURL) urlList

fetchURL :: URL -> IO Response

回答4:

Chunking the threads may be inefficient if a few of them last significantly longer than the others. Here is a smoother, yet more complex, solution:

{-# LANGUAGE TupleSections #-}
import Control.Concurrent.Async (async, waitAny)
import Data.List                (delete, sortBy)
import Data.Ord                 (comparing)

concurrentlyLimited :: Int -> [IO a] -> IO [a]
concurrentlyLimited n tasks = concurrentlyLimited' n (zip [0..] tasks) [] []

concurrentlyLimited' _ [] [] results = return . map snd $ sortBy (comparing fst) results
concurrentlyLimited' 0 todo ongoing results = do
    (task, newResult) <- waitAny ongoing
    concurrentlyLimited' 1 todo (delete task ongoing) (newResult:results)
concurrentlyLimited' n [] ongoing results = concurrentlyLimited' 0 [] ongoing results
concurrentlyLimited' n ((i, task):otherTasks) ongoing results = do
    t <- async $ (i,) <$> task
    concurrentlyLimited' (n-1) otherTasks (t:ongoing) results

Note : the above code could be made more generic using an instance of MonadBaseControl IO in place of IO, thanks to lifted-async.

回答5:

If you have actions in a list, this one has less dependencies

import Control.Concurrent.Async (mapConcurrently)
import Data.List.Split (chunksOf)

mapConcurrentChunks :: Int -> (a -> IO b) -> [a] -> IO [b]
mapConcurrentChunks n ioa xs = concat <$> mapM (mapConcurrently ioa) (chunksOf n xs)

Edit: Just shortened a bit

来源：https://stackoverflow.com/questions/18896103/can-haskells-control-concurrent-async-mapconcurrently-have-a-limit

标签

haskell

asynchronous

concurrency