How to load directory of JSON files into Apache Spark in Python

前端 未结 4 1558
时光取名叫无心
时光取名叫无心 2021-01-02 13:46

I\'m relatively new to Apache Spark, and I want to create a single RDD in Python from lists of dictionaries that are saved in multiple JSON files (each is gzipped and contai

4条回答
  •  旧巷少年郎
    2021-01-02 14:45

    To load list of Json from a file as RDD:

    def flat_map_json(x): return [each for each in json.loads(x[1])]   
    rdd = sc.wholeTextFiles('example.json').flatMap(flat_map_json)
    

提交回复
热议问题