“Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used” on an EMR cluster with 75GB of memory

前端 未结 5 642
梦谈多话
梦谈多话 2020-12-04 05:19

I\'m running a 5 node Spark cluster on AWS EMR each sized m3.xlarge (1 master 4 slaves). I successfully ran through a 146Mb bzip2 compressed CSV file and ended up with a per

5条回答
  •  没有蜡笔的小新
    2020-12-04 06:03

    I had the same issue on small cluster running relatively small job on spark 2.3.1. The job reads parquet file, removes duplicates using groupBy/agg/first then sorts and writes new parquet. It processed 51 GB of parquet files on 4 nodes (4 vcores, 32Gb RAM).

    The job was constantly failing on aggregation stage. I wrote bash script watch executors memory usage and found out that in the middle of the stage one random executor starts taking double memory for a few seconds. When I correlated time of this moment with GC logs it matched with full GC that empties big amount of memory.

    At last I understood that the problem is related somehow to GC. ParallelGC and G1 causes this issue constantly but ConcMarkSweepGC improves the situation. The issue appears only with small amount of partitions. I ran the job on EMR where OpenJDK 64-Bit (build 25.171-b10) was installed. I don't know the root cause of the issue, it could be related to JVM or operating system. But it is definitely not related to heap or off-heap usage in my case.

    UPDATE1

    Tried Oracle HotSpot, the issue is reproduced.

提交回复
热议问题