I have line data in .gz compressed format. I have to read it in pyspark Following is the code snippet
rdd = sc.textFile(\"data/label.gz\").map(func) <
rdd = sc.textFile(\"data/label.gz\").map(func)
You didn't write the error message you got, but it's probably not going well for you because gzipped files are not splittable. You need to use a splittable compression codec, like bzip2.