Distributed Map in Scala Spark

前端 未结 2 1997
臣服心动
臣服心动 2020-12-17 16:12

Does Spark support distributed Map collection types ?

So if I have an HashMap[String,String] which are key,value pairs , can this be converted to a distributed Map c

2条回答
  •  误落风尘
    2020-12-17 16:37

    The quick answer: Partially.

    You can transform a Map[A,B] into an RDD[(A,B)] by first forcing the map into a sequence of (k,v) pairs but by doing so you loose the constrain that keys of a map must be a set. ie. you loose the semantics of the Map structure.

    From a practical perspective, you can still resolve an element into its corresponding value using kvRdd.lookup(element) but the result will be a sequence, given that you have no warranties that there's a single lookup value as explained before.

    A spark-shell example to make things clear:

    val englishNumbers = Map(1 -> "one", 2 ->"two" , 3 -> "three")
    val englishNumbersRdd = sc.parallelize(englishNumbers.toSeq)
    
    englishNumbersRdd.lookup(1)
    res: Seq[String] = WrappedArray(one) 
    
    val spanishNumbers = Map(1 -> "uno", 2 -> "dos", 3 -> "tres")
    val spanishNumbersRdd = sc.parallelize(spanishNumbers.toList)
    
    val bilingueNumbersRdd = englishNumbersRdd union spanishNumbersRdd
    
    bilingueNumbersRdd.lookup(1)
    res: Seq[String] = WrappedArray(one, uno)
    

提交回复
热议问题