Parent-Child relationship in Talend

大兔子大兔子 提交于 2020-06-29 07:19:14

问题


Facing problem and out of ideas on figuring on how to implement parent-child relationship in Talend.

Problem Statement:

Having a feed file which has data in below format

MemberCode|LastName|FirstName
A|SHINE|MICHAEL 
B|SHINE|MICHELLE 
C|SHINE|ERIN 
A|RODRIGUEZ|DAMIAN 
A|PAVELSKY|STEPHEN        
B|PAVELSKY|TERESA

(there are many more columns and many more rows - just few rows for reference purpose). LastName and FirstName are self-explanatory. MemberCode denotes the relationship. A will be parent, B or C will be child. For a certain employee record the data will always be in sequential manner - meaning the complete parent-child data will be in continuous rows.

Expected Result:

The above data needs to be outputed in below format:

  MemberCode|MemberLastName|MemberFirstName|DependentLastName|DependentFirstName
A         |SHINE         |MICHAEL        |                 |                  
B         |SHINE         |MICHAEL        |SHINE            |MICHELLE          
C         |SHINE         |MICHAEL        |SHINE            |ERIN              
A         |RODRIGUEZ     |DAMIAN         |                 |                  
A         |PAVELSKY      |STEPHEN        |                 |                  
B         |PAVELSKY      |STEPHEN        |PAVELSKY         |TERESA            

What I have tried so far:

The Talend job is having these components: tFileInputDelimited->tMap->tLogRow And tMap has the below logic - which gives me output like below -

MemberCode|MemberLastName|MemberFirstName|DependentLastName|DependentFirstName
A         |SHINE         |MICHAEL        |                 |                  
B         |              |               |SHINE            |MICHELLE          
C         |              |               |SHINE            |ERIN              
A         |RODRIGUEZ     |DAMIAN         |                 |                  
A         |PAVELSKY      |STEPHEN        |                 |                  
B         |              |               |PAVELSKY         |TERESA

How to replicate the value for MemberFirstName and MemberLastName for MemberCode A for the rows having MemberCode B or C. Thanks in advance.

Platform: Talend Open Studio for Data Integration Version: 6.5.1


回答1:


Here's the solution I put together:

You need to split your rows into parents and children based on their MemberCode. You write the parents to file with DependentLastName and DependentFirstName being empty, while saving the parent info to global variables (ParentLastName and ParentFirstName) in a tSetGlobalVar.

When you move to the next row, which is a child row, your parent has already been saved as it's always the first in the group. So you can retrieve its first and last name using the global variables in the children output, and write this to the same physical file.

Both tFileOutputDelimited components have identical settings; they are in append mode, and have the option Custom the flush buffer size set to 1 (this is important in order to keep the rows sorted in the right order).




回答2:


The solution provided by @iMezouar works just fine. Posting another alternative way.

Job Layout:

The approach used was to capture the previous row values (LastName & FirstName) and store them in variables inside tMap and then use them to the output row.



来源:https://stackoverflow.com/questions/49927959/parent-child-relationship-in-talend

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!