Apache Pig: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

蓝咒 提交于 2019-12-11 08:17:31

问题


I'm running Pig 15 and am trying to group data here. I'm running into a Requested array size exceeds VM limit error. The file size is pretty small and takes just 10 mappers of 2.5G each to run with no memory errors.

Below shown is a snippet of what I'm doing:

sample_set = LOAD 's3n://<bucket>/<dev_dir>/000*-part.gz' USING PigStorage(',') AS (col1:chararray,col2:chararray..col23:chararray);
sample_set_group_by_col1 = GROUP sample_set BY col1;
sample_set_group_by_col1_10 = LIMIT sample_set_group_by_col1 10;
DUMP sample_set_group_by_col1_10;

This job fails with the following error:

2016-08-08 14:28:59,622 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:580)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:641)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:474)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40)
at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:198)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:1696)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1180)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)

Has anybody come across this error before? If yes, what's the solution to this?


回答1:


From error it looks like the output of GROUP BY statement is huge. This huge record created doesn't fit into Java Heap Space currently available. Normally Hadoop Mappers and Reducer tasks are allocated 1GB of heap space. Try increasing java heap size while running this pig script by following parameter.

SET mapreduce.map.memory.mb 4096;
SET mapreduce.reduce.memory.mb 6144;

If it still doesn't work, try increasing the size more using above parameters.



来源:https://stackoverflow.com/questions/38852893/apache-pig-java-lang-outofmemoryerror-requested-array-size-exceeds-vm-limit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!