Why does Spark job fail with “too many open files”?

后端未结

关注

 3  1882

花落未央 2020-12-05 13:19

I get \"too many open files\" during the shuffle phase of my Spark job. Why is my job opening so many files? What steps can I take to try to make my job succeed.

3条回答

难免孤独 (楼主)

2020-12-05 14:15

Another solution for this error is reducing your partitions.

check to see if you've got a lot of partitions with:

someBigSDF.rdd.getNumPartitions()

Out[]: 200

#if you need to persist the repartition, do it like this
someBigSDF = someBigSDF.repartition(20)

#if you just need it for one transformation/action, 
#you can do the repartition inline like this
someBigSDF.repartition(20).groupBy("SomeDt").agg(count("SomeQty")).orderBy("SomeDt").show()

0 讨论(0)

查看其它3个回答