PySpark converting a column of type 'map' to multiple columns in a dataframe

后端 未结 2 1001
走了就别回头了
走了就别回头了 2020-11-29 09:17

Input

I have a column Parameters of type map of the form:

>>> from pyspark.sql import SQLContext
>>> s         


        
2条回答
  •  长情又很酷
    2020-11-29 10:06

    Since keys of the MapType are not a part of the schema you'll have to collect these first for example like this:

    from pyspark.sql.functions import explode
    
    keys = (df
        .select(explode("Parameters"))
        .select("key")
        .distinct()
        .rdd.flatMap(lambda x: x)
        .collect())
    

    When you have this all what is left is simple select:

    from pyspark.sql.functions import col
    
    exprs = [col("Parameters").getItem(k).alias(k) for k in keys]
    df.select(*exprs)
    

提交回复
热议问题