best way to “glue” columns together

前端 未结 2 1821
Happy的楠姐
Happy的楠姐 2020-12-12 05:09

I need to combine columns from about 15 tables in one large table. Something the following works. But it takes very long to run while the CPU usage spikes to 100%, which cau

2条回答
  •  一向
    一向 (楼主)
    2020-12-12 05:28

    Your question is not really clear and it would be helpful if you included expected result. Let me guess what you want...

    I'll give more meaningful names to the tables in your example and add few more rows to highlight the problem. In real life these tables would be real tables, of course, not variables, but I'll stick with variables to make this sample script easy to run and try. I'm using SQL Server 2008 for this example.

    declare @TMain table (empid int);
    declare @TPhones table (empid int, phone varchar(50));
    declare @TLicenses table (empid int, license varchar(50));
    declare @TEmails table (empid int, email varchar(100));
    
    insert into @TMain values (1);
    insert into @TMain values (2);
    insert into @TMain values (3);
    insert into @TMain values (4);
    
    insert into @TPhones values (1, '5551234');
    insert into @TPhones values (2, '5551111');
    insert into @TPhones values (2, '5552222');
    insert into @TPhones values (2, '5553333');
    insert into @TPhones values (2, '5554444');
    
    insert into @TLicenses values (2, 'L4455');
    insert into @TLicenses values (3, 'L7890');
    
    insert into @TEmails values (2, 'xxx@abc');
    insert into @TEmails values (2, 'yyy@abc');
    insert into @TEmails values (2, 'zzz@abc');
    

    Simple variant

    There is a fast, efficient and wrong naive approach:

    SELECT
        Main.empid
        ,Phones.phone
        ,Licenses.license
        ,Emails.email
    FROM
        @TMain AS Main
        LEFT JOIN @TPhones AS Phones ON Phones.empid = Main.empid
        LEFT JOIN @TLicenses AS Licenses ON Licenses.empid = Main.empid
        LEFT JOIN @TEmails AS Emails ON Emails.empid = Main.empid
    ORDER BY Main.empid, phone, license, email;
    

    It produces Cartesian product of all rows and duplicates rows. This is the result set of the query above. You can see that empid = 2 returned 12 rows, which is 4 phones multiplied by 3 emails and by 1 license. My guess is that you want to see only 4 rows for empid = 2. In other words, for each empid the result should have minimum possible number of rows (I'll show the correct result set in the end).

    empid   phone   license email
    1   5551234 NULL    NULL
    2   5551111 L4455   xxx@abc
    2   5551111 L4455   yyy@abc
    2   5551111 L4455   zzz@abc
    2   5552222 L4455   xxx@abc
    2   5552222 L4455   yyy@abc
    2   5552222 L4455   zzz@abc
    2   5553333 L4455   xxx@abc
    2   5553333 L4455   yyy@abc
    2   5553333 L4455   zzz@abc
    2   5554444 L4455   xxx@abc
    2   5554444 L4455   yyy@abc
    2   5554444 L4455   zzz@abc
    3   NULL    L7890   NULL
    4   NULL    NULL    NULL
    

    Long variant

    I'm not sure whether my proposed approach below is more efficient than yours. You'll have to try both and compare performance for your data.

    We'll need a table of numbers. SQL, Auxiliary table of numbers http://web.archive.org/web/20150411042510/http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html http://dataeducation.com/you-require-a-numbers-table/

    Again, in real life you'll have a proper table of numbers, but for this example I'll use the following:

    declare @TNumbers table (Number int);
    insert into @TNumbers values (1);
    insert into @TNumbers values (2);
    insert into @TNumbers values (3);
    insert into @TNumbers values (4);
    insert into @TNumbers values (5);
    

    The main idea behind my approach is to make a helper table that would contain correct number of rows for each empid at first and then use this table to get results efficiently.

    We'll start with counting number of phones, licenses, e-mails for each empid:

    WITH
    CTE_Rows
    AS
    (
        SELECT Phones.empid, COUNT(*) AS EmpRows
        FROM @TPhones AS Phones
        GROUP BY Phones.empid
    
        UNION ALL
    
        SELECT Licenses.empid, COUNT(*) AS EmpRows
        FROM @TLicenses AS Licenses
        GROUP BY Licenses.empid
    
        UNION ALL
    
        SELECT Emails.empid, COUNT(*) AS EmpRows
        FROM @TEmails AS Emails
        GROUP BY Emails.empid
    )
    

    Then we calculate the maximum number of rows for each empid:

    ,CTE_MaxRows
    AS
    (
        SELECT
            CTE_Rows.empid
            ,MAX(CTE_Rows.EmpRows) AS MaxEmpRows
        FROM CTE_Rows
        GROUP BY CTE_Rows.empid
    )
    

    The CTE above has one row for each empid: empid itself and a maximum number of phones, licenses, e-mails for this empid. Now we need to expand this table and generate the given number of rows for each empid. Here I'm using the Numbers table for it:

    ,CTE_RowNumbers
    AS
    (
    SELECT
        CTE_MaxRows.empid
        ,Numbers.Number AS rn
    FROM
        CTE_MaxRows
        CROSS JOIN @TNumbers AS Numbers
    WHERE
        Numbers.Number <= CTE_MaxRows.MaxEmpRows
    )
    

    Then we need to add row numbers to all tables with data, which we'll use for joining later:

    ,CTE_Phones
    AS
    (
        SELECT
            Phones.empid
            ,ROW_NUMBER() OVER (PARTITION BY Phones.empid ORDER BY phone) AS rn
            ,Phones.phone
        FROM @TPhones AS Phones
    )
    ,CTE_Licenses
    AS
    (
        SELECT
            Licenses.empid
            ,ROW_NUMBER() OVER (PARTITION BY Licenses.empid ORDER BY license) AS rn
            ,Licenses.license
        FROM @TLicenses AS Licenses
    )
    ,CTE_Emails
    AS
    (
        SELECT
            Emails.empid
            ,ROW_NUMBER() OVER (PARTITION BY Emails.empid ORDER BY email) AS rn
            ,Emails.email
        FROM @TEmails AS Emails
    )
    

    Now we are ready to join all this together. CTE_RowNumbers has exact number of rows that we need, so there is no need for complex FULL JOINs here, simple LEFT JOIN is enough:

    ,CTE_Data
    AS
    (
        SELECT
            CTE_RowNumbers.empid
            ,CTE_Phones.phone
            ,CTE_Licenses.license
            ,CTE_Emails.email
        FROM
            CTE_RowNumbers
            LEFT JOIN CTE_Phones ON CTE_Phones.empid = CTE_RowNumbers.empid AND CTE_Phones.rn = CTE_RowNumbers.rn
            LEFT JOIN CTE_Licenses ON CTE_Licenses.empid = CTE_RowNumbers.empid AND CTE_Licenses.rn = CTE_RowNumbers.rn
            LEFT JOIN CTE_Emails ON CTE_Emails.empid = CTE_RowNumbers.empid AND CTE_Emails.rn = CTE_RowNumbers.rn
    )
    

    We are almost done. I guess, it is possible that the main table has some empids that don't have any related data (no phones, no liceses, no e-mails), like empid = 4 in my sample data. To get these empids in the result set I'll left join the CTE_Data to the main table:

    SELECT
        Main.empid
        ,CTE_Data.phone
        ,CTE_Data.license
        ,CTE_Data.email
    FROM
        @TMain AS Main
        LEFT JOIN CTE_Data ON CTE_Data.empid = Main.empid
    ORDER BY Main.empid, phone, license, email;
    

    To get the full script just put all code blocks from this post together in the same order as they appear here.

    This is the result set:

    empid   phone   license email
    1   5551234 NULL    NULL
    2   5551111 L4455   xxx@abc
    2   5552222 NULL    yyy@abc
    2   5553333 NULL    zzz@abc
    2   5554444 NULL    NULL
    3   NULL    L7890   NULL
    4   NULL    NULL    NULL
    

提交回复
热议问题