SparkR window function

后端 未结 1 1776
没有蜡笔的小新
没有蜡笔的小新 2020-12-06 15:53

I found from JIRA that 1.6 release of SparkR has implemented window functions including lag and rank, but over function i

相关标签:
1条回答
  • 2020-12-06 16:06

    Spark 2.0.0+

    SparkR provides DSL wrappers with over, window.partitionBy / partitionBy, window.orderBy / orderBy and rowsBetween / rangeBeteen functions.

    Spark <= 1.6

    Unfortunately it is not possible in 1.6.0. While some window functions, including lag, have been implemented SparkR doesn't support window definitions yet which renders these completely useless.

    As long as SPARK-11395 is not resolved the only option is to use raw SQL:

    set.seed(1)
    
    hc <- sparkRHive.init(sc)
    sdf <- createDataFrame(hc, data.frame(x=1:12, y=1:3, z=rnorm(12)))
    registerTempTable(sdf, "sdf")
    
    sql(hc, "SELECT x, y, z, LAG(z) OVER (PARTITION BY y ORDER BY x) FROM sdf") %>% 
      head()
    
    ##    x y          z        _c3
    ## 1  1 1 -0.6264538         NA
    ## 2  4 1  1.5952808 -0.6264538
    ## 3  7 1  0.4874291  1.5952808
    ## 4 10 1 -0.3053884  0.4874291
    ## 5  2 2  0.1836433         NA
    ## 6  5 2  0.3295078  0.1836433
    

    Assuming that the corresponding PR will be merged without significant changes window definition and example query should look as follows:

    w <- Window.partitionBy("y") %>% orderBy("x")
    select(sdf, over(lag(sdf$z), w))
    
    0 讨论(0)
提交回复
热议问题