Convert List of tuple to map (and deal with duplicate key ?)

前端 未结 8 2144
[愿得一人]
[愿得一人] 2020-12-12 14:32

I was thinking about a nice way to convert a List of tuple with duplicate key [(\"a\",\"b\"),(\"c\",\"d\"),(\"a\",\"f\")] into map (\"a\" -> [\"b\", \"

相关标签:
8条回答
  • 2020-12-12 15:06

    Group and then project:

    scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
    //x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
    scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
    //res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
    

    More scalish way to use fold, in the way like there (skip map f step).

    0 讨论(0)
  • 2020-12-12 15:08

    Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)

    val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
    

    GroupBy variation

    list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
    

    Fold Left variation

    list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
      acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
        acc ++ Map(value._1 -> (value._2 :: v))
      }
    })
    

    Aggregate Variation - Similar to fold Left

    list.aggregate[Map[String, List[String]]](Map())(
      (acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 -> 
        List(value._2))){ v =>
         acc ++ Map(value._1 -> (value._2 :: v))
      },
      (l, r) => l ++ r
    )
    

    Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)

    import org.apache.spark.rdd._
    import org.apache.spark.{SparkContext, SparkConf}
    
    val conf: SparkConf = new 
    SparkConf().setAppName("Spark").setMaster("local")
    val sc: SparkContext = new SparkContext (conf)
    
    // This gives you a rdd of the same result
    val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
       (value: String) => List(value),
       (acc: List[String], value) => value :: acc,
       (accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
    )
    
    // To convert this RDD back to a Map[(String, List[String])] you can do the following
    rdd.collect().toMap
    
    0 讨论(0)
  • 2020-12-12 15:11

    You can try this

    scala> val b = new Array[Int](3)
    // b: Array[Int] = Array(0, 0, 0)
    scala> val c = b.map(x => (x -> x * 2))
    // c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
    scala> val d = Map(c : _*)
    // d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)
    
    0 讨论(0)
  • 2020-12-12 15:15

    Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:

    List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
    // Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
    

    This:

    • groups elements based on the first part of tuples (group part of groupMap)

    • maps grouped values by taking their second tuple part (map part of groupMap)

    This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.

    0 讨论(0)
  • 2020-12-12 15:22

    Here's another alternative:

    x.groupBy(_._1).mapValues(_.map(_._2))
    
    0 讨论(0)
  • 2020-12-12 15:29

    For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:

    List("a" -> 1, "b" -> 2).toMap
    // Result: Map(a -> 1, c -> 2)
    

    As of 2.12, the default policy reads:

    Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.

    0 讨论(0)
提交回复
热议问题