How to load directory of JSON files into Apache Spark in Python

前端 未结 4 1566
时光取名叫无心
时光取名叫无心 2021-01-02 13:46

I\'m relatively new to Apache Spark, and I want to create a single RDD in Python from lists of dictionaries that are saved in multiple JSON files (each is gzipped and contai

4条回答
  •  灰色年华
    2021-01-02 14:31

    You can load a directory of files into a single RDD using textFile and it also supports wildcards. That wouldn't give you file names, but you don't seem to need them.

    You can use Spark SQL while using basic transformations like map, filter etc. SchemaRDD is also an RDD (in Python, as well as Scala)

提交回复
热议问题