Reading Millions of Small JSON Files from S3 Bucket in PySpark Very Slow

前端 未结 0 1954
执笔经年
执笔经年 2020-12-04 14:47

I have a folder (path = mnt/data/*.json) in s3 with millions of json files (each file is less than 10 KB). I run the following code:

df = (spark.read
                 


        
相关标签:
回答
  • 消灭零回复
提交回复
热议问题