Top N Per Group with Multiple Table Joins

前端 未结 3 922
礼貌的吻别
礼貌的吻别 2020-12-15 12:54

Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into

相关标签:
3条回答
  • 2020-12-15 12:57

    The answer given by @RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize @r in the initvars subquery. I changed,

    from (select @g:=null) initvars

    to

    from (select @g:=null, @r:=null) initvars

    You could also initialize @r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the @g variable, it increments the row number, which is stored in the variable @r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the @r variable (and thereby the RowNum) will be reset to 1.

    0 讨论(0)
  • 2020-12-15 13:13

    Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.

    select
        vendorid, productid, NumSales
    from
    (
        select
            vendorid, productid, NumSales,
            @r := IF(@g=vendorid,@r+1,1) RowNum,
            @g := vendorid
        from (select @g:=null) initvars
        CROSS JOIN 
        (
            SELECT COUNT(oi.price) AS NumSales, 
                   p.productid, 
                   p.vendorid
            FROM products p
            INNER JOIN vendors v ON (p.vendorid = v.vendorid)
            INNER JOIN orders_items oi ON (p.productid = oi.productid)
            INNER JOIN orders o ON (oi.orderid = o.orderid)
            WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
            AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
            AND o.`Status` = 'SETTLED'
            AND o.Deleted = 0
            GROUP BY p.vendorid, p.productid
            ORDER BY p.vendorid, NumSales DESC
        ) T
    ) U
    WHERE RowNum <= 3
    ORDER BY NumSales DESC
    LIMIT 100;
    

    The approach here is

    1. Group by to get NumSales
    2. Use variables to row number the sales per vendor/product
    3. Filter the numbered dataset to allow for a max of 3 per vendor
    4. Order the remaining by NumSales DESC and return only 100
    0 讨论(0)
  • 2020-12-15 13:17

    I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.

    From the docs:

    As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.

    Just adding this note here in case someone else comes across this weird behaviour.

    0 讨论(0)
提交回复
热议问题