Reduce a key-value pair into a key-list pair with Apache Spark

前端 未结 9 1406
生来不讨喜
生来不讨喜 2020-11-27 14:21

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ...

9条回答
  •  Happy的楠姐
    2020-11-27 14:55

    The error message stems from the type for 'a' in your closure.

     My_KMV = My_KV.reduce(lambda a, b: a.append([b]))
    

    Let pySpark explicitly evaluate a as a list. For instance,

    My_KMV = My_KV.reduceByKey(lambda a,b:[a].extend([b]))
    

    In many cases, reduceByKey will be preferable to groupByKey, refer to: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html

提交回复
热议问题