Spark 2.0.x dump a csv file from a dataframe containing one array of type string

前端 未结 6 1642
难免孤独
难免孤独 2020-11-29 07:07

I have a dataframe df that contains one column of type array

df.show() looks like

|ID|ArrayOfString|Age|Gender|
+--+-------         


        
6条回答
  •  没有蜡笔的小新
    2020-11-29 08:01

    The reason why you are getting this error is that csv file format doesn't support array types, you'll need to express it as a string to be able to save.

    Try the following :

    import org.apache.spark.sql.functions._
    
    val stringify = udf((vs: Seq[String]) => vs match {
      case null => null
      case _    => s"""[${vs.mkString(",")}]"""
    })
    
    df.withColumn("ArrayOfString", stringify($"ArrayOfString")).write.csv(...)
    

    or

    import org.apache.spark.sql.Column
    
    def stringify(c: Column) = concat(lit("["), concat_ws(",", c), lit("]"))
    
    df.withColumn("ArrayOfString", stringify($"ArrayOfString")).write.csv(...)
    

提交回复
热议问题