What is the difference between rowsBetween and rangeBetween?

前端 未结 4 538
逝去的感伤
逝去的感伤 2020-12-13 10:04

From the PySpark docs rangeBetween:

rangeBetween(start, end)

Defines the frame boundaries, from start (inclusive) to end (inc

4条回答
  •  半阙折子戏
    2020-12-13 10:10

    rowsBetween: - With rowsBetween you define a boundary frame of rows to calculate, which frame is calculated independently.

    Frame in rowsBetween does not depend on orderBy clause.

    df = spark.read.csv(r'C:\Users\akashSaini\Desktop\TT.csv',inferSchema =True, header=True).na.drop()
    w =Window.partitionBy('DEPARTMENT').orderBy('SALARY').rowsBetween(Window.unboundedPreceding,Window.currentRow)
    df.withColumn('RowsBetween', F.sum(df.SALARY).over(w)).show()
    
    
    first_name|Department|Salary|RowsBetween|
    
     Sofia|     Sales| 20000| 20000|
    Gordon|     Sales| 25000| 45000|
    Gracie|     Sales| 25000| 70000|
    Cellie|     Sales| 25000| 95000|
    Jervis|     Sales| 30000|125000|
     Akash|  Analysis| 30000| 30000|
    Richard|   Account| 12000| 12000|
     Joelly|   Account| 15000| 27000|
    Carmiae|   Account| 15000| 42000|
        Bob|   Account| 20000| 62000|
      Gally|   Account| 28000| 90000
    

    rangeBetween: - With rangeBetween, you define a boundary frame of rows to calculate, which may change.

    Frame in rowsBetween depends on orderBy clause. rangeBetween will include all the rows which has same value in orderBy clause like Gordon, Gracie and Cellie have same salary so included with the current frame.

    For more understanding see below example: -

    df = spark.read.csv(r'C:\Users\asaini28.EAD\Desktop\TT.csv',inferSchema =True, header=True).na.drop()
    w =Window.partitionBy('DEPARTMENT').orderBy('SALARY').rangeBetween(Window.unboundedPreceding,Window.currentRow)
    df.withColumn('RangeBetween', F.sum(df.SALARY).over(w)).select('first_name','Department','Salary','Test').show()
    
     first_name|Department|Salary|RangeBetween|
      Sofia|     Sales| 20000| 20000|
     Gordon|     Sales| 25000| 95000|
     Gracie|     Sales| 25000| 95000|
     Cellie|     Sales| 25000| 95000|
     Jervis|     Sales| 30000|125000|
      Akash|  Analysis| 30000| 30000|
    Richard|   Account| 12000| 12000|
     Joelly|   Account| 15000| 42000|
    Carmiae|   Account| 15000| 42000|
        Bob|   Account| 20000| 62000|
      Gally|   Account| 28000| 90000|
    

提交回复
热议问题