Why does Spark job fail with “too many open files”?

后端 未结 3 1882
花落未央
花落未央 2020-12-05 13:19

I get \"too many open files\" during the shuffle phase of my Spark job. Why is my job opening so many files? What steps can I take to try to make my job succeed.

3条回答
  •  难免孤独
    2020-12-05 14:15

    Another solution for this error is reducing your partitions.

    check to see if you've got a lot of partitions with:

    someBigSDF.rdd.getNumPartitions()
    
    Out[]: 200
    
    #if you need to persist the repartition, do it like this
    someBigSDF = someBigSDF.repartition(20)
    
    #if you just need it for one transformation/action, 
    #you can do the repartition inline like this
    someBigSDF.repartition(20).groupBy("SomeDt").agg(count("SomeQty")).orderBy("SomeDt").show()
    

提交回复
热议问题