How to split a list to multiple columns in Pyspark?

后端 未结 3 2036
余生分开走
余生分开走 2020-11-27 19:21

I have:

key   value
a    [1,2,3]
b    [2,3,4]

I want:

key value1 value2 value3
a     1      2      3
b     2      3      4
         


        
3条回答
  •  忘掉有多难
    2020-11-27 20:05

    It depends on the type of your "list":

    • If it is of type ArrayType():

      df = hc.createDataFrame(sc.parallelize([['a', [1,2,3]], ['b', [2,3,4]]]), ["key", "value"])
      df.printSchema()
      df.show()
      root
       |-- key: string (nullable = true)
       |-- value: array (nullable = true)
       |    |-- element: long (containsNull = true)
      

      you can access the values like you would with python using []:

      df.select("key", df.value[0], df.value[1], df.value[2]).show()
      +---+--------+--------+--------+
      |key|value[0]|value[1]|value[2]|
      +---+--------+--------+--------+
      |  a|       1|       2|       3|
      |  b|       2|       3|       4|
      +---+--------+--------+--------+
      
      +---+-------+
      |key|  value|
      +---+-------+
      |  a|[1,2,3]|
      |  b|[2,3,4]|
      +---+-------+
      
    • If it is of type StructType(): (maybe you built your dataframe by reading a JSON)

      df2 = df.select("key", psf.struct(
              df.value[0].alias("value1"), 
              df.value[1].alias("value2"), 
              df.value[2].alias("value3")
          ).alias("value"))
      df2.printSchema()
      df2.show()
      root
       |-- key: string (nullable = true)
       |-- value: struct (nullable = false)
       |    |-- value1: long (nullable = true)
       |    |-- value2: long (nullable = true)
       |    |-- value3: long (nullable = true)
      
      +---+-------+
      |key|  value|
      +---+-------+
      |  a|[1,2,3]|
      |  b|[2,3,4]|
      +---+-------+
      

      you can directly 'split' the column using *:

      df2.select('key', 'value.*').show()
      +---+------+------+------+
      |key|value1|value2|value3|
      +---+------+------+------+
      |  a|     1|     2|     3|
      |  b|     2|     3|     4|
      +---+------+------+------+
      

提交回复
热议问题