I have:
key value
a [1,2,3]
b [2,3,4]
I want:
key value1 value2 value3
a 1 2 3
b 2 3 4
I'd like to add the case of sized lists (arrays) to pault answer.
In the case that our column contains medium sized arrays (or large sized ones) it is still possible to split them in columns.
from pyspark.sql.types import * # Needed to define DataFrame Schema.
from pyspark.sql.functions import expr
# Define schema to create DataFrame with an array typed column.
mySchema = StructType([StructField("V1", StringType(), True),
StructField("V2", ArrayType(IntegerType(),True))])
df = spark.createDataFrame([['A', [1, 2, 3, 4, 5, 6, 7]],
['B', [8, 7, 6, 5, 4, 3, 2]]], schema= mySchema)
# Split list into columns using 'expr()' in a comprehension list.
arr_size = 7
df = df.select(['V1', 'V2']+[expr('V2[' + str(x) + ']') for x in range(0, arr_size)])
# It is posible to define new column names.
new_colnames = ['V1', 'V2'] + ['val_' + str(i) for i in range(0, arr_size)]
df = df.toDF(*new_colnames)
The result is:
df.show(truncate= False)
+---+---------------------+-----+-----+-----+-----+-----+-----+-----+
|V1 |V2 |val_0|val_1|val_2|val_3|val_4|val_5|val_6|
+---+---------------------+-----+-----+-----+-----+-----+-----+-----+
|A |[1, 2, 3, 4, 5, 6, 7]|1 |2 |3 |4 |5 |6 |7 |
|B |[8, 7, 6, 5, 4, 3, 2]|8 |7 |6 |5 |4 |3 |2 |
+---+---------------------+-----+-----+-----+-----+-----+-----+-----+