Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?

前端 未结 1 1568
南旧
南旧 2020-12-05 07:40

Need to load data from a single file with a 100,000+ records into multiple tables on MySQL maintaining the relationships defined in the file/tables; meaning the relationship

相关标签:
1条回答
  • 2020-12-05 08:21

    I put together a sample transformation(right click and choose save link) based on what you provided. The only step I feel a bit uncertain on is the last table inputs. I'm basically writing the join data to the table and letting it fail if a specific relationship already exists.

    note:

    This solution doesn't really meet the "All approaches should include some from of validation and a rollback strategy should an insert fail, or fail to maintain referential integrity." criteria, though it probably won't fail. If you really want to setup something complex we can but this should definitely get you going with these transformations.

    alt text

    Dataflow by Step

    1. We start with reading in your file. In my case I converted it to CSV but tab is fine too. alt text

    2. Now we're going to insert the employee names into the Employee table using a combination lookup/update. After the insert we append the employee_id to our datastream as id and remove the EmployeeName from the data stream.

    alt text

    3. Here we're just using a Select Values step to rename the id field to employee_id alt text

    4. Insert Job Titles just like we did employees and append the title id to our datastream also deleting the JobLevelHistory from the datastream.

    alt text

    5. Simple rename of the title id to title_id(see step 3) alt text

    6. Insert offices, get id's, remove OfficeHistory from the stream.

    alt text

    7. Simple rename of the office id to office_id(see step 3)

    alt text

    8. Copy Data from the last step into two streams with the values employee_id,office_id and employee_id,title_id respectively.

    alt text alt text

    9. Use a table insert to insert the join data. I've got it selected to ignore insert errors as there could be duplicates and the PK constraints will make some rows fail.

    Output Tables

    alt text

    alt text

    alt text

    alt text

    alt text

    0 讨论(0)
提交回复
热议问题