Reading Millions of Small JSON Files from S3 Bucket in PySpark Very Slow

前端未结

关注

 0  1954

I have a folder (path = mnt/data/*.json) in s3 with millions of json files (each file is less than 10 KB). I run the following code:

df = (spark.read


                      
              相关标签: