Spark: 'Requested array size exceeds VM limit' when writing dataframe
问题 I am running into a "OutOfMemoryError: Requested array size exceeds VM limit" error when running my Scala Spark job. I'm running this job on an AWS EMR cluster with the following makeup: Master: 1 m4.4xlarge 32 vCore, 64 GiB memory Core: 1 r3.4xlarge 32 vCore, 122 GiB memory The version of Spark I'm using is 2.2.1 on EMR release label 5.11.0. I'm running my job in a spark shell with the following configurations: spark-shell --conf spark.driver.memory=40G --conf spark.driver.maxResultSize=25G