How to ensure output dataframe is dynamically partitioned such that each partition is around 128mb?

后端未结

关注

 0  469

In Spark, I have a few jobs chained (i.e. output of one will be input to the next). The issue I am facing is, say, my input dataset to first job is 10GB today and I am repar