How to push a big file data in talend?

a 夏天 提交于 2019-12-25 05:33:38

问题


I have created a table where I have a text input file which is 7.5 GB in size and there are 65 million records and now I want to push that data into an Amazon RedShift table.

But after processing 5.6 million records it's no longer moving.

What can be the issue? Is there any limitation with tFileOutputDelimited as the job has been running for 3 hours.

Below is the job which I have created to push data in to Redshift table.

tFileInputDelimited(.text)---tMap--->tFilOutputDelimited(csv)

|

|

tS3Put(copy output file to S3) ------> tRedShiftRow(createTempTable)--> tRedShiftRow(COPY to Temp)


回答1:


The limitation comes from Tmap component, its not the good choice to deal with large amount of data, for your case, you have to enable the option "Store temp data" to overcome the memory consumption limitation of Tmap. Its well described in Talend Help Center.




回答2:


Looks like, tFilOutputDelimited(csv) is creating the problem. Any file can't handle after certain amount of data. Not sure thought. Try to find out a way to load only portion of the parent input file and commit it in redshift. Repeat the process till your parent input file gets completely processed.



来源:https://stackoverflow.com/questions/30236819/how-to-push-a-big-file-data-in-talend

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!