发表新帖

发表新帖

Updating a dataframe column in spark

前端未结

关注

 5  1617

庸人自扰 2020-11-28 02:55

Looking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns.

How would I go about changing a value in row x

5条回答

青春惊慌失措 (楼主)

2020-11-28 03:32
Just as maasg says you can create a new DataFrame from the result of a map applied to the old DataFrame. An example for a given DataFrame df with two rows:
```
val newDf = sqlContext.createDataFrame(df.map(row => 
  Row(row.getInt(0) + SOMETHING, applySomeDef(row.getAs[Double]("y")), df.schema)
```
Note that if the types of the columns change, you need to give it a correct schema instead of df.schema. Check out the api of org.apache.spark.sql.Row for available methods: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html

[Update] Or using UDFs in Scala:
```
import org.apache.spark.sql.functions._

val toLong = udf[Long, String] (_.toLong)

val modifiedDf = df.withColumn("modifiedColumnName", toLong(df("columnName"))).drop("columnName")
```
and if the column name needs to stay the same you can rename it back:
```
modifiedDf.withColumnRenamed("modifiedColumnName", "columnName")
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题