Spark - Group by HAVING with dataframe syntax?

前端 未结 2 596
抹茶落季
抹茶落季 2020-12-18 19:56

What\'s the syntax for using a groupby-having in Spark without an sql/hiveContext? I know I can do

DataFrame df = some_df
df.registreTempTable(\"df\");    
d         


        
相关标签:
2条回答
  • 2020-12-18 20:07

    Say for example if I want to find products in each category, having fees less than 3200 and their count must not be less than 10:

    • SQL query:
    sqlContext.sql("select Category,count(*) as 
    count from hadoopexam where HadoopExamFee<3200  
    group by Category having count>10")
    
    • DataFrames API
    from pyspark.sql.functions import *
    
    df.filter(df.HadoopExamFee<3200)
      .groupBy('Category')
      .agg(count('Category').alias('count'))
      .filter(column('count')>10)
    
    0 讨论(0)
  • 2020-12-18 20:18

    Yes, it doesn't exist. You express the same logic with agg followed by where:

    df.groupBy(someExpr).agg(somAgg).where(somePredicate) 
    
    0 讨论(0)
提交回复
热议问题