Spark 2.0.x dump a csv file from a dataframe containing one array of type string

前端 未结 6 1542
难免孤独
难免孤独 2020-11-29 07:07

I have a dataframe df that contains one column of type array

df.show() looks like

|ID|ArrayOfString|Age|Gender|
+--+-------         


        
6条回答
  •  一整个雨季
    2020-11-29 07:59

    To answer DreamerP's question (from one of the comments) :

    from pyspark.sql.functions import udf
    from pyspark.sql.types import StringType
    
    def array_to_string(my_list):
        return '[' + ','.join([str(elem) for elem in my_list]) + ']'
    
    array_to_string_udf = udf(array_to_string, StringType())
    
    df = df.withColumn('Antecedent_as_str', array_to_string_udf(df["Antecedent"]))
    df = df.withColumn('Consequent_as_str', array_to_string_udf(df["Consequent"]))
    df = df.drop("Consequent")
    df = df.drop("Antecedent")
    df.write.csv("foldername")
    

提交回复
热议问题