What is the difference between rowsBetween and rangeBetween?

前端未结

关注

 4  538

逝去的感伤 2020-12-13 10:04

From the PySpark docs rangeBetween:

rangeBetween(start, end)

Defines the frame boundaries, from start (inclusive) to end (inc

4条回答

半阙折子戏 (楼主)

2020-12-13 10:10

rowsBetween: - With rowsBetween you define a boundary frame of rows to calculate, which frame is calculated independently.

Frame in rowsBetween does not depend on orderBy clause.

df = spark.read.csv(r'C:\Users\akashSaini\Desktop\TT.csv',inferSchema =True, header=True).na.drop()
w =Window.partitionBy('DEPARTMENT').orderBy('SALARY').rowsBetween(Window.unboundedPreceding,Window.currentRow)
df.withColumn('RowsBetween', F.sum(df.SALARY).over(w)).show()


first_name|Department|Salary|RowsBetween|

 Sofia|     Sales| 20000| 20000|
Gordon|     Sales| 25000| 45000|
Gracie|     Sales| 25000| 70000|
Cellie|     Sales| 25000| 95000|
Jervis|     Sales| 30000|125000|
 Akash|  Analysis| 30000| 30000|
Richard|   Account| 12000| 12000|
 Joelly|   Account| 15000| 27000|
Carmiae|   Account| 15000| 42000|
    Bob|   Account| 20000| 62000|
  Gally|   Account| 28000| 90000

rangeBetween: - With rangeBetween, you define a boundary frame of rows to calculate, which may change.

Frame in rowsBetween depends on orderBy clause. rangeBetween will include all the rows which has same value in orderBy clause like Gordon, Gracie and Cellie have same salary so included with the current frame.

For more understanding see below example: -

df = spark.read.csv(r'C:\Users\asaini28.EAD\Desktop\TT.csv',inferSchema =True, header=True).na.drop()
w =Window.partitionBy('DEPARTMENT').orderBy('SALARY').rangeBetween(Window.unboundedPreceding,Window.currentRow)
df.withColumn('RangeBetween', F.sum(df.SALARY).over(w)).select('first_name','Department','Salary','Test').show()

 first_name|Department|Salary|RangeBetween|
  Sofia|     Sales| 20000| 20000|
 Gordon|     Sales| 25000| 95000|
 Gracie|     Sales| 25000| 95000|
 Cellie|     Sales| 25000| 95000|
 Jervis|     Sales| 30000|125000|
  Akash|  Analysis| 30000| 30000|
Richard|   Account| 12000| 12000|
 Joelly|   Account| 15000| 42000|
Carmiae|   Account| 15000| 42000|
    Bob|   Account| 20000| 62000|
  Gally|   Account| 28000| 90000|

0 讨论(0)

查看其它4个回答