How to read a zip containing multiple files in Apache Spark

前端 未结 5 746
星月不相逢
星月不相逢 2020-12-06 18:41

I am having a Zipped file containing multiple text files. I want to read each of the file and build a List of RDD containining the content of each files.

val         


        
5条回答
  •  难免孤独
    2020-12-06 19:15

    If you are reading binary files use sc.binaryFiles. This will return an RDD of tuples containing the file name and a PortableDataStream. You can feed the latter into a ZipInputStream.

提交回复
热议问题