talend

Talend performance

我怕爱的太早我们不能终老 提交于 2019-12-11 06:28:28
问题 We have a requirement where we are reading data from three different files and doing joins among these files with different columns in the same job. Each file size is around 25-30 GB. Our system RAM size is just 16GB. Doing joins with tmap. Talend is keeping all the reference data in physical memory. In my case, i cannot provide that much memory. Job fails due to out of memory. If i use join with temp disk option in tmap, job was dead slow. Please help me with these questions. How Talend

Pivot data in Talend

孤人 提交于 2019-12-11 05:18:38
问题 I have some data which I need to pivot in Talend. This is a sample: brandname,metric,value A,xyz,2 B,xyz,2 A,abc,3 C,def,1 C,ghi,6 A,ghi,1 Now I need this data to be pivoted on the metric column like this: brandname,abc,def,ghi,xyz A,3,null,1,2 B,null,null,null,2 C,null,1,6,null Currently I am using tPivotToColumnsDelimited to pivot the data to a file and reading back from that file. However having to store data on an external file and reading back is messy and unnecessary overhead. Is there

I have to perform more stuff after the parallelization work using Talend Studio. How do I place a connecting OnSubJobOk?

怎甘沉沦 提交于 2019-12-11 05:14:05
问题 I am trying to implement parallelization within talend. I have it working, but now I don't know how to connect the parallelization work to the next part. Usually, you would click on the previous block and select OnSubjobOk. That option doesn't appear. Is there another component that I need to add that I don't know about? 回答1: Under the basic settings of tParallelize you would find the option Wait For . This have two options - end of first subjob: sequence the relevant subjob to be executed at

OnComponentOrder flow and tMap connections in Talend

[亡魂溺海] 提交于 2019-12-11 04:48:35
问题 I have the following flow: 1 Component that needs to be executed to extract from MYSQL a certain timestamp 3 MYSQL inputs that needs to use that timestamp 1 tMap which needs to get the 3 mysql input However, I am not allowed to connect the 3 mysql into the single tMap because they are depending on the first component (through OnComponentOk) but with different order. How do I orchestrate this sort of situations? 回答1: You could execute a query and set a global variable using the tSetGlobalVar

Talend routine add maven dependency

雨燕双飞 提交于 2019-12-11 04:16:09
问题 In Talend Data Integration Studio (7.0.1), I have a custom routine, where I want to use a Maven dependency (JavaFaker). By the project explorer window, I can add maven depnedency on CustomRoutine project. After refreshing, maven dependency javaFaker (and it sub dependencies) looks loaded, are displayed in project explorer, and autocompletion is ok on routine code. But when I try to run the job, no way to make it work: Class is not found (java.lang.NoClassDefFoundError). Sometimes maven

Big query load fails with Bad Character (ASCII 0) while importing Datastore backup

一世执手 提交于 2019-12-11 01:55:40
问题 This may look like already discussed scenario. I am trying to load Google App Engine DataStore backup into BQ using Talend tBigQueryBulkExec component, which does same as BQ Shell CLI. It connects to BQ and tries to read files from GCS and move to defined Dataset.Tablename as given in component settings. Error Message: location":"File: 0 / Line:8 / Field:1","message":"Bad character (ASCII 0) encountered: field starts with: ","reason":"invalid"} Entire message: {"configuration":{"load":{

Strategy to load a set of files in Talend

风流意气都作罢 提交于 2019-12-10 21:25:18
问题 I want to know which is best strategy to aboard the following problem in Talend: I need to load data from a set of delimited files that are stored in a directory with names like (SAMPLE1.DAT, SAMPLE2.DAT, ... , SAMPLEX.DAT) The target will be a table in a MySQL database I have to load all data at once because after this task I need to work with all records in the same table I'm a bit confused because I don't know if it possible in Talend. I was seeing the tFileInputDelimited component but I

Outputting a single Excel file with multiple worksheets

寵の児 提交于 2019-12-10 18:08:45
问题 Is there a component in Talend Open Studio for Data Integration to be able to output a single Excel file but with 2 separate sheets in it? I want to separate some columns in the original file into another sheet and another set of columns to the second sheet. 回答1: You'll need to output your data into two separate tFileOutputExcel components with the second one set to append the data to the file as a different sheet. A quick example has some name and age data held against a unique id that needs

Common Logging in Talend

大城市里の小女人 提交于 2019-12-10 18:04:34
问题 I was trying to implement logging in Talend. So I made a job using normal components, I have recorded the error, info and debug through Twarn and Tdie. Using Logcatcher , I am segregating the log into two files of debug and error. This part is working fine. Now I have made two jobs, First:- Using trowgenerator generating lines, then sending to tmap and from tmap, I am sending to two twarn components based on some condition. Second:- A job which has tlogcatcher, t_filterrow and segregating to

Talend - generating n multiple rows from 1 row

╄→гoц情女王★ 提交于 2019-12-10 09:33:11
问题 Background: I'm using Talend to do something (I guess) that is pretty common: generating multiple rows from one. For example: ID | Name | DateFrom | DateTo 01 | Marco| 01/01/2014 | 04/01/2014 ...could be split into: new_ID | ID | Name | DateFrom | DateTo 01 | 01 | Marco | 01/01/2014 | 02/01/2014 02 | 01 | Marco | 02/01/2014 | 03/01/2014 03 | 01 | Marco | 03/01/2014 | 04/01/2014 The number of outcoming rows is dynamic, depending on the date period in the original row. Question: how can I do