Spark Failure : Caused by: org.apache.spark.shuffle.FetchFailedException: Too large frame: 5454002341

后端 未结 5 1161
遥遥无期
遥遥无期 2021-01-06 07:51

I am generating a hierarchy for a table determining the parent child.

Below is the configuration used, even after getting the error with regards to the too large fra

5条回答
  •  情歌与酒
    2021-01-06 08:16

    I was experiencing the same issue while I was working on a ~ 700GB dataset. Decreasing spark.maxRemoteBlockSizeFetchToMem didn't help in my case. In addition, I wasn't able to increase the amount of partitions.

    Doing the following worked for me:

    1. Increasing spark.network.timeout (default value is 120 seconds in Spark 2.3) which is affecting the following:
    spark.core.connection.ack.wait.timeout
    spark.storage.blockManagerSlaveTimeoutMs
    spark.shuffle.io.connectionTimeout
    spark.rpc.askTimeout
    spark.rpc.lookupTimeout
    
    1. Setting spark.network.timeout=600s (default is 120s in Spark 2.3)

    2. Setting spark.io.compression.lz4.blockSize=512k (default is 32k in Spark 2.3)

    3. Setting spark.shuffle.file.buffer=1024k(default is 32k in Spark 2.3)

提交回复
热议问题