How do I split flat file data and load into parent-child tables in database?

前端 未结 2 815
一个人的身影
一个人的身影 2020-12-06 06:54

I have denormalized data (coming from a file) that needs to be imported into parent-child tables. The source data is something like this:

Account#    Name           


        
相关标签:
2条回答
  • 2020-12-06 07:28

    If the data is sorted and Account# is an integer I would:

    Insert the emails into a table (add an auto increment column, it's a best practise).

    1  101    alpha@foo.com
    2  101    bravo@foo.com
    3  101    charlie@yay.com
    etc.
    

    Then I would insert the other records to a parent table.

    • using Account# as the primary key
    • omitting the email addresses
    • skipping duplicates (easy if the data is sorted).

    If you have a foreign key relationship setup, you will need to do the second step first (to avoid having any orphan records).

    My two cents: I don't know what your requirements are but it seems a bit over-normalized. If there is a small limit on the number of email addresses, I would consider adding several email columns to the main table...for speed and simplicity.

    0 讨论(0)
  • 2020-12-06 07:39

    Here is one possible option that you can consider in loading parent-child data. This option consists of two steps. In the first step, read the source file and write data to parent table. In the second step, read the source file again and use lookup transformation to fetch the parent info in order to write data to the child table. Following example uses the data provided in the question. This example was created using SSIS 2008 R2 and SQL Server 2008 database.

    Step-by-Step process:

    1. Create a sample flat file named Source.txt as shown in screenshot #1.

    2. In the SQL database, create two tables named dbo.Parent and dbo.Child using the scripts given under SQL Scripts section. Both the tables have an auto generated identity column.

    3. On the package, place an OLE DB connection to connect to the SQL Server and Flat File connection to read the source file as shown in screenshot #2. Configure the flat file connection as shown in screenshots #3 - #9.

    4. On the Control Flow tab, place two Data Flow Tasks as shown in screenshot #10.

    5. Inside the data flow task named Parent, place a Flat File source, Sort transformation and an OLE DB destination as shown in screenshot #11.

    6. Configure the flat file source as shown in screenshots #12 and #13. We need to read the flat file source.

    7. Configure the sort transformation as shown in screenshot #14. We need to eliminate the duplicate values so that only the unique records are inserted into the parent table dbo.Parent.

    8. Configure the ole db destination as shown in screenshots #15 and #16. We need to insert the data into the parent table dbo.Parent.

    9. Inside the data flow task named Child, place a Flat File source, Lookup transformation and an OLE DB destination as shown in screenshot #17.

    10. Configure the flat file source as shown in screenshots #12 and #13. This configuration is same as the flat file source in the previous data flow task.

    11. Configure the lookup transformation as shown in screenshots #18 and #20. We need to find the parent id from the table dbo.Parent using the other key columns present in the file. The key columns here are the Account, Name and Email. If the file happened to have a unique column, you could just use that column alone to fetch the parent id.

    12. Configure the ole db destination as shown in screenshots #21 and #22. We need to insert the Email column along with the Parent id into the table dbo.Child.

    13. Screenshot #23 shows data in the tables before the package execution.

    14. Screenshots #24 and #25 show sample package execution.

    15. Screenshot #26 shows data in the tables after the package execution.

    Hope that helps.

    SQL Scripts:

    CREATE TABLE [dbo].[Child](
        [ChildId] [int] IDENTITY(1,1) NOT NULL,
        [ParentId] [int] NULL,
        [Email] [varchar](21) NULL,
    CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED ([ChildId] ASC)) ON [PRIMARY]
    GO
    
    CREATE TABLE [dbo].[Parent](
        [ParentId] [int] IDENTITY(1,1) NOT NULL,
        [Account] [varchar](12) NULL,
        [Name] [varchar](12) NULL,
        [Membership] [varchar](14) NULL,
    CONSTRAINT [PK_Parent] PRIMARY KEY CLUSTERED ([ParentId] ASC)) ON [PRIMARY]
    GO
    

    Screenshot #1:

    1

    Screenshot #2:

    2

    Screenshot #3:

    3

    Screenshot #4:

    4

    Screenshot #5:

    5

    Screenshot #6:

    6

    Screenshot #7:

    7

    Screenshot #8:

    8

    Screenshot #9:

    9

    Screenshot #10:

    10

    Screenshot #11:

    11

    Screenshot #12:

    12

    Screenshot #13:

    13

    Screenshot #14:

    14

    Screenshot #15:

    15

    Screenshot #16:

    16

    Screenshot #17:

    17

    Screenshot #18:

    18

    Screenshot #19:

    19

    Screenshot #20:

    20

    Screenshot #21:

    21

    Screenshot #22:

    22

    Screenshot #23:

    23

    Screenshot #24:

    24

    Screenshot #25:

    25

    Screenshot #26:

    26

    0 讨论(0)
提交回复
热议问题