I found from JIRA that 1.6 release of SparkR has implemented window functions including lag and rank, but over function is not implemented yet. How can I use window function like lag function without over in SparkR(not the SparkSQL way)? Can someone provide an example?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Spark 2.0.0+
SparkR provides DSL wrappers with over, window.partitionBy / partitionBy, window.orderBy / orderBy and rowsBetween / rangeBeteen functions.
Spark
Unfortunately it is not possible in 1.6.0. While some window functions, including lag, have been implemented SparkR doesn't support window definitions yet which renders these completely useless.
As long as SPARK-11395 is not resolved the only option is to use raw SQL:
set.seed(1) hc % head() ## x y z _c3 ## 1 1 1 -0.6264538 NA ## 2 4 1 1.5952808 -0.6264538 ## 3 7 1 0.4874291 1.5952808 ## 4 10 1 -0.3053884 0.4874291 ## 5 2 2 0.1836433 NA ## 6 5 2 0.3295078 0.1836433 Assuming that the corresponding PR will be merged without significant changes window definition and example query should look as follows:
w % orderBy("x") select(sdf, over(lag(sdf$z), w)) 文章来源: SparkR window function