I\'m relatively new to Apache Spark, and I want to create a single RDD in Python from lists of dictionaries that are saved in multiple JSON files (each is gzipped and contai
To load list of Json from a file as RDD:
RDD
def flat_map_json(x): return [each for each in json.loads(x[1])] rdd = sc.wholeTextFiles('example.json').flatMap(flat_map_json)