How can i add multiple columns in Spark Datframe in efficiently

后端 未结 2 511
感情败类
感情败类 2020-12-06 15:39

I have set of columns names and need to add those columns in existing dataframe which is also very huge in size, i need to add the all columns from set to dataframe with Str

2条回答
  •  伪装坚强ぢ
    2020-12-06 16:08

    So this is in PySpark.

    df.select(
        '*', 
        *list(F.lit(None).alias(f'col{n}') for n in range(7,13))
    ).show()
    
    +---+----------+----+----+----+-----+----+-----+----+-----+-----+-----+-----+----+-----+----+----+-----+-----+-----+-----+-----+----+-----+
    |Id |Name      |col7|col8|col3|col17|col6|col20|col2|col14|col16|col21|col15|col9|col10|col5|col1|col13|col19|col11|col22|col18|col4|col12|
    +---+----------+----+----+----+-----+----+-----+----+-----+-----+-----+-----+----+-----+----+----+-----+-----+-----+-----+-----+----+-----+
    |1  |James     |null|null|null|null |null|null |null|null |null |null |null |null|null |null|null|null |null |null |null |null |null|null |
    |2  |Michael   |null|null|null|null |null|null |null|null |null |null |null |null|null |null|null|null |null |null |null |null |null|null |
    |3  |Robert    |null|null|null|null |null|null |null|null |null |null |null |null|null |null|null|null |null |null |null |null |null|null |
    |4  |Washington|null|null|null|null |null|null |null|null |null |null |null |null|null |null|null|null |null |null |null |null |null|null |
    |5  |Jefferson |null|null|null|null |null|null |null|null |null |null |null |null|null |null|null|null |null |null |null |null |null|null |
    +---+----------+----+----+----+-----+----+-----+----+-----+-----+-----+-----+----+-----+----+----+-----+-----+-----+-----+-----+----+-----+
    

    This logic translates to Scala spark if you understand how to replace the list comprehension in Scala with map.

    This is faster as it creates the 22 columns to be executed at once, rather that adding them in iterations as foldleft does.

提交回复
热议问题