Updating a dataframe column in spark

前端 未结 5 1617
庸人自扰
庸人自扰 2020-11-28 02:55

Looking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns.

How would I go about changing a value in row x

5条回答
  •  青春惊慌失措
    2020-11-28 03:32

    Just as maasg says you can create a new DataFrame from the result of a map applied to the old DataFrame. An example for a given DataFrame df with two rows:

    val newDf = sqlContext.createDataFrame(df.map(row => 
      Row(row.getInt(0) + SOMETHING, applySomeDef(row.getAs[Double]("y")), df.schema)
    

    Note that if the types of the columns change, you need to give it a correct schema instead of df.schema. Check out the api of org.apache.spark.sql.Row for available methods: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html

    [Update] Or using UDFs in Scala:

    import org.apache.spark.sql.functions._
    
    val toLong = udf[Long, String] (_.toLong)
    
    val modifiedDf = df.withColumn("modifiedColumnName", toLong(df("columnName"))).drop("columnName")
    

    and if the column name needs to stay the same you can rename it back:

    modifiedDf.withColumnRenamed("modifiedColumnName", "columnName")
    

提交回复
热议问题