I am generating a hierarchy for a table determining the parent child.
Below is the configuration used, even after getting the error with regards to the too large fra
I was experiencing the same issue while I was working on a ~ 700GB dataset. Decreasing spark.maxRemoteBlockSizeFetchToMem didn't help in my case. In addition, I wasn't able to increase the amount of partitions.
Doing the following worked for me:
spark.network.timeout (default value is 120 seconds in Spark 2.3) which is affecting the following:spark.core.connection.ack.wait.timeout
spark.storage.blockManagerSlaveTimeoutMs
spark.shuffle.io.connectionTimeout
spark.rpc.askTimeout
spark.rpc.lookupTimeout
Setting spark.network.timeout=600s (default is 120s in Spark 2.3)
Setting spark.io.compression.lz4.blockSize=512k (default is 32k in Spark 2.3)
Setting spark.shuffle.file.buffer=1024k(default is 32k in Spark 2.3)