Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?

柔情痞子 提交于 2019-11-27 19:46:36
rwilliams

I put together a sample transformation(right click and choose save link) based on what you provided. The only step I feel a bit uncertain on is the last table inputs. I'm basically writing the join data to the table and letting it fail if a specific relationship already exists.

note:

This solution doesn't really meet the "All approaches should include some from of validation and a rollback strategy should an insert fail, or fail to maintain referential integrity." criteria, though it probably won't fail. If you really want to setup something complex we can but this should definitely get you going with these transformations.

Dataflow by Step

1. We start with reading in your file. In my case I converted it to CSV but tab is fine too.

2. Now we're going to insert the employee names into the Employee table using a combination lookup/update. After the insert we append the employee_id to our datastream as id and remove the EmployeeName from the data stream.

3. Here we're just using a Select Values step to rename the id field to employee_id

4. Insert Job Titles just like we did employees and append the title id to our datastream also deleting the JobLevelHistory from the datastream.

5. Simple rename of the title id to title_id(see step 3)

6. Insert offices, get id's, remove OfficeHistory from the stream.

7. Simple rename of the office id to office_id(see step 3)

8. Copy Data from the last step into two streams with the values employee_id,office_id and employee_id,title_id respectively.

9. Use a table insert to insert the join data. I've got it selected to ignore insert errors as there could be duplicates and the PK constraints will make some rows fail.

Output Tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!