SPARK DataFrame: How to efficiently split dataframe for each group based on same column values

前端 未结 3 1222
花落未央
花落未央 2020-12-31 10:41

I have a DataFrame generated as follows:

df.groupBy($\"Hour\", $\"Category\")
  .agg(sum($\"value\").alias(\"TotalValue\"))
  .sort($\"Hour\".asc,$\"TotalVal         


        
3条回答
  •  半阙折子戏
    2020-12-31 11:30

    //If you want to divide a dataset into n number of equal datasetssets
    double[] arraySplit = {1,1,1...,n}; //you can also divide into ratio if you change the numbers.
    
    List> datasetList = dataset.randomSplitAsList(arraySplit,1);
    

提交回复
热议问题