Is this scala parallel array code threadsafe?

烂漫一生 提交于 2019-12-06 03:59:47

问题


I want to use parallel arrays for a task, and before I start with the coding, I'd be interested in knowing if this small snipept is threadsafe:

import collection.mutable._

var listBuffer = ListBuffer[String]("one","two","three","four","five","six","seven","eight","nine")
var jSyncList  = java.util.Collections.synchronizedList(new java.util.ArrayList[String]())
listBuffer.par.foreach { e =>
    println("processed :"+e)
    // using sleep here to simulate a random delay
    Thread.sleep((scala.math.random * 1000).toLong)
    jSyncList.add(e)
}
jSyncList.toArray.foreach(println)

Are there better ways of processing something with parallel collections, and acumulating the results elsewhere?


回答1:


The code you posted is perfectly safe; I'm not sure about the premise though: why do you need to accumulate the results of a parallel collection in a non-parallel one? One of the whole points of the parallel collections is that they look like other collections.

I think that parallel collections also will provide a seq method to switch to sequential ones. So you should probably use this!




回答2:


For this pattern to be safe:

listBuffer.par.foreach { e => f(e) }

f has to be able to run concurrently in a safe way. I think the same rules that you need for safe multi-threading apply (access to share state needs to be thread safe, the order of the f calls for different e won't be deterministic and you may run into deadlocks as you start synchronizing your statements in f).

Additionally I'm not clear what guarantees the parallel collections gives you about the underlying collection being modified while being processed, so a mutable list buffer which can have elements added/removed is possibly a poor choice. You never know when the next coder will call something like foo(listBuffer) before your foreach and pass that reference to another thread which may mutate the list while it's being processed.

Other than that, I think for any f that will take a long time, can be called concurrently and where e can be processed out of order, this is a fine pattern.

immutCol.par.foreach { e => threadSafeOutOfOrderProcessingOf(e) }

disclaimer: I have not tried // colls myself, but I'm looking forward at having SO questions/answers show us what works well.




回答3:


The synchronisedList should be safe, though the println may give unexpected results - you have no guarantees of the order that items will be printed, or even that your printlns won't be interleaved mid-character.

A synchronised list is also unlikely to be the fastest way you can do this, a safer solution is to map over an immutable collection (Vector is probably your best bet here), then print all the lines (in order) afterwards:

val input = Vector("one","two","three","four","five","six","seven","eight","nine")
val output  = input.par.map { e =>
  val msg = "processed :" + e
  // using sleep here to simulate a random delay
  Thread.sleep((math.random * 1000).toLong)
  msg
}
println(output mkString "\n")

You'll also note that this code has about as much practical usefulness as your example :)




回答4:


This code is plain weird -- why add stuff in parallel to something that needs to be synchronized? You'll add contention and gain absolutely nothing in return.

The principle of the thing -- accumulating results from parallel processing, are better achieved with stuff like fold, reduce or aggregate.




回答5:


The code you've posted is safe - there will be no errors due to inconsistent state of your array list, because access to it is synchronized.

However, parallel collections process items concurrently (at the same time), AND out-of-order. The out-of-order means that the 54. element may be processed before the 2. element - your synchronized array list will contain items in non-predefined order.

In general it's better to use map, filter and other functional combinators to transform a collection into another collection - these will ensure that the ordering guarantees are preserved if a collection has some (like Seqs do). For example:

ParArray(1, 2, 3, 4).map(_ + 1)

always returns ParArray(2, 3, 4, 5).

However, if you need a specific thread-safe collection type such as a ConcurrentSkipListMap or a synchronized collection to be passed to some method in some API, modifying it from a parallel foreach is safe.

Finally, a note - parallel collection provide parallel bulk operations on data. Mutable parallel collections are not thread-safe in the sense that you can add elements to them from different threads. Mutable operations like insertion to a map or appending a buffer still have to be synchronized.



来源:https://stackoverflow.com/questions/5920837/is-this-scala-parallel-array-code-threadsafe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!