Applying function to Spark Dataframe Column

删除回忆录丶 提交于 2019-12-18 11:56:36

问题


Coming from R, I am used to easily doing operations on columns. Is there any easy way to take this function that I've written in scala

def round_tenths_place( un_rounded:Double ) : Double = {
    val rounded = BigDecimal(un_rounded).setScale(1, BigDecimal.RoundingMode.HALF_UP).toDouble
    return rounded
}

And apply it to a one column of a dataframe - kind of what I hoped this would do:

 bid_results.withColumn("bid_price_bucket", round_tenths_place(bid_results("bid_price")) )

I haven't found any easy way and am struggling to figure out how to do this. There's got to be an easier way than converting the dataframe to and RDD and then selecting from rdd of rows to get the right field and mapping the function across all of the values, yeah? And also something more succinct creating a SQL table and then doing this with a sparkSQL UDF?


回答1:


You can define an UDF as follows:

val round_tenths_place_udf = udf(round_tenths_place _)
bid_results.withColumn(
  "bid_price_bucket", val round_tenths_place_udf($"bid_price"))

although built-in Round expression is using exactly the same logic as your function and should be more than enough, not to mention much more efficient:

import org.apache.spark.sql.functions.round

bid_results.withColumn("bid_price_bucket", round($"bid_price", 1))

See also:

  • Updating a dataframe column in spark
  • How to apply a function to a column of a Spark DataFrame?


来源:https://stackoverflow.com/questions/35227568/applying-function-to-spark-dataframe-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!