Spark 2.0.x dump a csv file from a dataframe containing one array of type string

前端 未结 6 1635
难免孤独
难免孤独 2020-11-29 07:07

I have a dataframe df that contains one column of type array

df.show() looks like

|ID|ArrayOfString|Age|Gender|
+--+-------         


        
6条回答
  •  时光说笑
    2020-11-29 07:54

    Pyspark implementation.

    In this example, change the field column_as_array to column_as_string before saving.

    from pyspark.sql.functions import udf
    from pyspark.sql.types import StringType
    
    def array_to_string(my_list):
        return '[' + ','.join([str(elem) for elem in my_list]) + ']'
    
    array_to_string_udf = udf(array_to_string, StringType())
    
    df = df.withColumn('column_as_str', array_to_string_udf(df["column_as_array"]))
    

    Then you can drop the old column (array type) before saving.

    df.drop("column_as_array").write.csv(...)
    

提交回复
热议问题