Facing OutOfMemoryException while exporting bigtable tables to google cloud storage

半世苍凉 提交于 2021-01-29 11:27:05

问题


I am exporting a table in Cloud Bigtable to Cloud Storage by following this link https://cloud.google.com/bigtable/docs/exporting-sequence-files#exporting_sequence_files_2

The bigtable table size is ~300GB and the dataflow pipeline results in this error

An OutOfMemoryException occurred. Consider specifying higher memory instances in PipelineOptions.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)...

and the error suggests to increase the memory of instance type used for the Dataflow job. I also received a warning saying

Worker machine type has insufficient disk (25 GB) to support this type of Dataflow job. Please increase the disk size given by the diskSizeGb/disk_size_gb execution parameter.

I re-checked the command to run the pipeline here (https://github.com/googleapis/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import) and tried to look for any command line option which helps me to set custom instance type or PD size for the instance but couldn't find any.

By default the instance type is n1-standard-1 and PD Size is 25GB.

Is there any parameter to pass during job creation which would help me to escape this error? If yes, what are they?


回答1:


I found the parameters to select custom PD size and instance type. It is

--diskSizeGb=[Disk_size_in_GBs] --workerMachineType=[GCP_VM_machine_type]

For my case I used

--diskSizeGb=100 --workerMachineType=n1-highmem-4

These parameters are part of PipelineOptions class for defining execution time parameters. You can refer more parameters here https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.html

But since I had set --maxNumWorkers to 30 for autoscaling I ran into some Quota issues which will prevent your job from autoscaling and will be slowed down but no errors.



来源:https://stackoverflow.com/questions/56467089/facing-outofmemoryexception-while-exporting-bigtable-tables-to-google-cloud-stor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!