问题
I'm new to Talend and trying to migrate a simple process from existing ETL into Talend ETL. The process itself is
Input file-->tMap (few string manipulation and lookup)-->write output
Lookup file has 3 columns (long, 1 char string, 2 char string). Long value is the key. Size of input and lookup file (each around 10GB). Server spec is 16 core (2.9GHz) 64GB RAM 8GB swap running linux.
I executed the job with Xmx/Xms values of 30g,45g,50g but each time failed with either GC overhead limit reached or Out of heap space. Tried using "Store temp data" to "true" and changing values of buffer size in tMap to a bigger number. That didn't help either.
Anyone faced such issues with large size lookups in Talend?
Thanks
回答1:
Like @Th_talend say try to filter column.
You can try this too (if you can):
- Store your file in temporary table and after you make the join directly with SQL in one input in order to work with the SGBD and not talend (tmap).
回答2:
Bring only the required columns into tMap from lookup by using tcolumnfilter.
did you give the temp storage data
回答3:
You can split the lookup file into manageable chunks (may be 500M), then perform the join in many stages for each chunk. This is surely work and you dont need a custom code neither an external tool, but it may have a bad performance.
回答4:
Dont use 30g,45g,50g for xms and xmx
try the below combo
xms XMS
4GB 8GB
5GM 10GB
8GM 16GB
Dont you have Hadoop?
来源:https://stackoverflow.com/questions/42536042/talend-10-gb-input-and-lookup-out-of-memory-error