PySpark OutOfMemoryErrors when performing many dataframe joins
问题 There's many posts about this issue, but none have answered my question. I'm running into OutOfMemoryError s in PySpark while attempting to join many different dataframes together. My local machine has 16GB of memory, and I've set my Spark configurations as such: class SparkRawConsumer: def __init__(self, filename, reference_date, FILM_DATA): self.sparkContext = SparkContext(master='local[*]', appName='my_app') SparkContext.setSystemProperty('spark.executor.memory', '3g') SparkContext