How to read gz compressed file by pyspark

前端 未结 3 968
名媛妹妹
名媛妹妹 2020-12-19 02:04

I have line data in .gz compressed format. I have to read it in pyspark Following is the code snippet

rdd = sc.textFile(\"data/label.gz\").map(func)
<         


        
3条回答
  •  太阳男子
    2020-12-19 02:45

    You didn't write the error message you got, but it's probably not going well for you because gzipped files are not splittable. You need to use a splittable compression codec, like bzip2.

提交回复
热议问题