Talend 10 GB input and lookup out of memory error

别说谁变了你拦得住时间么 提交于 2019-12-11 07:55:55

问题


I'm new to Talend and trying to migrate a simple process from existing ETL into Talend ETL. The process itself is

Input file-->tMap (few string manipulation and lookup)-->write output

Lookup file has 3 columns (long, 1 char string, 2 char string). Long value is the key. Size of input and lookup file (each around 10GB). Server spec is 16 core (2.9GHz) 64GB RAM 8GB swap running linux.

I executed the job with Xmx/Xms values of 30g,45g,50g but each time failed with either GC overhead limit reached or Out of heap space. Tried using "Store temp data" to "true" and changing values of buffer size in tMap to a bigger number. That didn't help either.

Anyone faced such issues with large size lookups in Talend?

Thanks


回答1:


Like @Th_talend say try to filter column.

You can try this too (if you can):

  • Store your file in temporary table and after you make the join directly with SQL in one input in order to work with the SGBD and not talend (tmap).



回答2:


Bring only the required columns into tMap from lookup by using tcolumnfilter.

did you give the temp storage data




回答3:


You can split the lookup file into manageable chunks (may be 500M), then perform the join in many stages for each chunk. This is surely work and you dont need a custom code neither an external tool, but it may have a bad performance.




回答4:


Dont use 30g,45g,50g for xms and xmx

try the below combo

xms XMS

4GB 8GB

5GM 10GB

8GM 16GB

Dont you have Hadoop?



来源:https://stackoverflow.com/questions/42536042/talend-10-gb-input-and-lookup-out-of-memory-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!