how to cast all columns of dataframe to string

前端 未结 3 582
没有蜡笔的小新
没有蜡笔的小新 2020-12-31 13:18

I have a mixed type dataframe. I am reading this dataframe from hive table using spark.sql(\'select a,b,c from table\') command.

Some columns are int

相关标签:
3条回答
  • 2020-12-31 13:54

    Just:

    from pyspark.sql.functions import col
    
    table = spark.sql("table")
    
    table.select([col(c).cast("string") for c in table.columns])
    
    0 讨论(0)
  • 2020-12-31 13:54

    For Scala, spark version > 2.0

    case class Row(id: Int, value: Double)
    
    import spark.implicits._
    
    import org.apache.spark.sql.functions._
    
    val r1 = Seq(Row(1, 1.0), Row(2, 2.0), Row(3, 3.0)).toDF()
    
    r1.show
    +---+-----+
    | id|value|
    +---+-----+
    |  1|  1.0|
    |  2|  2.0|
    |  3|  3.0|
    +---+-----+
    
    val castedDF = r1.columns.foldLeft(r1)((current, c) => current.withColumn(c, col(c).cast("String")))
    
    castedDF.printSchema
    root
     |-- id: string (nullable = false)
     |-- value: string (nullable = false)
    
    0 讨论(0)
  • 2020-12-31 14:00

    Here's a one line solution in Scala :

    df.select(df.columns.map(c => col(c).cast(StringType)) : _*)
    

    Let's see an example here :

    import org.apache.spark.sql._
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.functions._
    val data = Seq(
       Row(1, "a"),
       Row(5, "z")
    )
    
    val schema = StructType(
      List(
        StructField("num", IntegerType, true),
        StructField("letter", StringType, true)
     )
    )
    
    val df = spark.createDataFrame(
      spark.sparkContext.parallelize(data),
      schema
    )
    
    df.printSchema
    //root
    //|-- num: integer (nullable = true)
    //|-- letter: string (nullable = true)
    
    val newDf = df.select(df.columns.map(c => col(c).cast(StringType)) : _*)
    
    newDf.printSchema
    //root
    //|-- num: string (nullable = true)
    //|-- letter: string (nullable = true)
    

    I hope it helps

    0 讨论(0)
提交回复
热议问题