How to send each group at a time to the spark executors?

后端 未结 2 1657
温柔的废话
温柔的废话 2021-01-17 04:07

I\'m unable to send each group of dataframe at a time to the executor.

I have a data as below in company_model_vals_df dataframe.

 -----         


        
2条回答
  •  悲哀的现实
    2021-01-17 04:49

    If I understand your question correctly, you want to manipulate the data separately for each "model_id","fiscal_quarter","fiscal_year".

    If that's correct, you would do it with a groupBy(), for example:

    company_model_vals_df.groupBy("model_id","fiscal_quarter","fiscal_year").agg(avg($"col1") as "average")
    

    If what you're looking for is to write each logical group into a separate folder, you can do that by writing:

    company_model_vals_df.write.partitionBy("model_id","fiscal_quarter","fiscal_year").parquet("path/to/save")
    

提交回复
热议问题