how to just extract the last 2 days recent files from tftpfilelist based on modified time without storing in a tbufferoutput component-talend job

我与影子孤独终老i 提交于 2020-12-15 04:36:52

问题


As of now i am iterating through all the 5k files available in the folder and store them in a tbufferoutput and read through them by using tbufferinput and sorting them based on mtime desc(modified time in the ftp site) in the descending order and extract the top 10 files only.

Since its iterating through all the 5k files at once its time consuming and causing unnecessary latency issues with the remote ftp site.

i was wondering if there is any other simple way without iterating just get the latest top 10 files from the ftp site directly and sort them based on mtime desc and perform operations with them?

My talend job flow looks like this at the moment,would advise any other methods that could optimize the performance of the job in a much better way!

Basically i dont want to iterate and run through all the files in the ftp site,instead directly get the top 10 from the remote ftp :tftpfilelist and perform checks in db and download them later

IS THERE ANYWAY WITHOUT ITERATING ,CAN I JUST GET THE LATEST 10 FILES just by using modified timestamp in desc order alone?-This is the question in short OR I want to extract the LAST 3 days files from the remote ftp site.

Filename is in this format:A_B_C_D_E_20200926053617.csv

Approach B:WITH JAVA, I tried using the tjava code as below: for the flow B:

Date lastModifiedDate = TalendDate.parseDate("EEE MMM dd HH:mm:ss zzz yyyy", row2.mtime_string);

Date current_date = TalendDate.getCurrentDate();

System.out.println(lastModifiedDate);

System.out.println(current_date);
System.out.println(((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")));

if(TalendDate.diffDate(current_date, lastModifiedDate,"dd") <= 1) {

System.out.println

output_row.abs_path = input_row.abs_path;

System.out.println(output_row.abs_path);
}

Now the tlogrow3 is printing NULL values all over,please suggest


回答1:


Define 3 context variables :

in tJava, compute the mask (with wildcard) for the 3 days (starting at the current date) :

Date currentDate = TalendDate.getCurrentDate();
Date currentDateMinus1 = TalendDate.addDate(currentDate, -1, "dd");
Date currentDateMinus2 = TalendDate.addDate(currentDate, -2, "dd");

context.mask1 ="*" + TalendDate.formatDate("yyyyMMdd", currentDate) + "*.csv";
context.mask2 ="*" + TalendDate.formatDate("yyyyMMdd", currentDateMinus1) + "*.csv";
context.mask3 ="*" + TalendDate.formatDate("yyyyMMdd", currentDateMinus2) + "*.csv";

then in the tFTPFileList, use the 3 context variables for filemask :

to retrieve the files only from today and the 2 previous day.



来源:https://stackoverflow.com/questions/64258690/how-to-just-extract-the-last-2-days-recent-files-from-tftpfilelist-based-on-modi

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!