How to limit contained associations per record/group?

岁酱吖の 提交于 2019-11-26 02:57:49

问题


I have a Model, Articles, which hasMany Abstracts. I want to load the 10 latest Articles, and for each Article, the Abstract with the highest number of points. My function looks like this:

public function getArticles($category, $viewName) {
            $subArticles = $this->Articles->findByCategory($category)->contain([
                    \'Abstracts\' => function ($q) {
                            return $q
                                    ->select([\'body\', \'points\', \'article_id\'])
                                    ->where([\'Abstracts.approved\' => true])
                                    ->limit(10)
                                    ->order([\'Abstracts.points\' => \'DESC\']);
                    }
            ])
            ->limit(10)
            ->order([\'Articles.created\' => \'DESC\']) ;
            $this->set( $viewName . \'Articles\', $subArticles );
    }

The result that I get is not what I intend though. Looking through the SQL, first CakePHP is getting the articles.id of everything in the category (fine). Then, CakePHP goes into the Abstracts table, using those 10 articles.id\'s it just found, and asks for the 10 Abstracts with the highest votes (that belong to those Articles).

The problem is that I want 1 Abstract for each Article, not the 10 Abstracts belonging to any Article in that category. How can I fix this? Thanks!

EDIT

ndm suggested that this was a duplicate of Using limit() on contained model so I attempted the solution there. Namely, I added this to my Model:

 $this->hasOne(\'TopAbstract\', [
            \'className\' => \'Abstracts\',
            \'foreignKey\' => \'abstract_id\',
            \'strategy\' => \'select\',
            \'sort\' => [\'TopAbstract.points\' => \'DESC\'],
            \'conditions\' => function ($e, $query) {
            $query->limit(1);
            return $e;
    } ]);

And then I try to find the Articles byCategory, with contain([\'TopAbstract\']), only this kills my SQL. It dies a horrible death:

Error: SQLSTATE[HY000]: General error: 1 near \")\": syntax error

Debug doesn\'t even show the query that killed it, so I\'m not sure how to debug this one?

EDIT

Talking to myself a bit, but the error is definitely in the \'conditions\' part of the hasOne. I take that out, and it works fine. Can\'t find an example of how this is supposed to look on the interwebs.. anyone have any idea?


回答1:


What you are looking for, is a solution to the greatest-n-per-group problem. You didn't mention any specific RDBMS, but nonetheless see also http://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html

So let's give this a try, here's three options that can be applied on association level (defining the conditions could also be moved into custom finders), however you might consider them as not that "straightforward".


For something HasMany and BelongsToMany specific, scroll all the way down!


Select strategy - Using a join on a grouping, max-value subquery

$this->hasOne('TopAbstracts', [
    'className' => 'Abstracts',
    'strategy' => 'select',
    'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
        $query->innerJoin(
            [
                'AbstractsFilter' => $query
                    ->connection()
                    ->newQuery()
                    ->select(['article_id', 'points' => $query->func()->max('points')])
                    ->from('abstracts')
                    ->group('article_id')
            ],
            [
                'TopAbstracts.article_id = AbstractsFilter.article_id',
                'TopAbstracts.points = AbstractsFilter.points'
            ]
        );
        return [];
    }
]);

This will select the top abstracts via a join query that is based on the max points, it will look something like

SELECT
    TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
    abstracts TopAbstracts
INNER JOIN (
        SELECT
            article_id, (MAX(points)) AS `points`
        FROM
            abstracts
        GROUP BY
            article_id
    )
    AbstractsFilter ON (
        TopAbstracts.article_id = AbstractsFilter.article_id
        AND
        TopAbstracts.points = AbstractsFilter.points
    )
WHERE
    TopAbstracts.article_id in (1,2,3,4,5,6,7,8, ...)

Select strategy - Using left self-join filtering

$this->hasOne('TopAbstracts', [
    'className' => 'Abstracts',
    'strategy' => 'select',
    'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
        $query->leftJoin(
            ['AbstractsFilter' => 'abstracts'],
            [
                'TopAbstracts.article_id = AbstractsFilter.article_id',
                'TopAbstracts.points < AbstractsFilter.points'
            ]);
        return $exp->add(['AbstractsFilter.id IS NULL']);
    }
]);

This will use a self-join that filters based on the rows that don't have a.points < b.points, it will look something like

SELECT
    TopAbstracts.id AS `TopAbstracts__id`, ...
FROM 
    abstracts TopAbstracts
LEFT JOIN
    abstracts AbstractsFilter ON (
        TopAbstracts.article_id = AbstractsFilter.article_id
        AND
        TopAbstracts.points < AbstractsFilter.points
    )
WHERE
    (AbstractsFilter.id IS NULL AND TopAbstracts.article_id in (1,2,3,4,5,6,7,8, ...))

Join strategy - Using a subquery for the join condition

$this->hasOne('TopAbstracts', [
    'className' => 'Abstracts',
    'foreignKey' => false,
    'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
        $subquery = $query
            ->connection()
            ->newQuery()
            ->select(['SubTopAbstracts.id'])
            ->from(['SubTopAbstracts' => 'abstracts'])
            ->where(['Articles.id = SubTopAbstracts.article_id'])
            ->order(['SubTopAbstracts.points' => 'DESC'])
            ->limit(1);

        return $exp->add(['TopAbstracts.id' => $subquery]);
    }
]);

This will use a correlated subquery that uses a rather specific select with simple ordering and limiting to pick the top comment. Note that the foreignKey option is set to false in order to avoid an additional Articles.id = TopAbstracts.article_id condition to be compiled into the join conditions.

The query will look something like

SELECT
    Articles.id AS `Articles__id`, ... ,
    TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
    articles Articles
LEFT JOIN
    abstracts TopAbstracts ON (
        TopAbstracts.id = (
            SELECT
                SubTopAbstracts.id
            FROM
                abstracts SubTopAbstracts
            WHERE
                Articles.id = SubTopAbstracts.article_id
            ORDER BY
                SubTopAbstracts.points DESC
            LIMIT
                1
        )
    )

All these 3 options will query and inject the records without any hackery, it's just not very "straightforward".


A manual approach

For the sake of completeness, it is of course always possible to manually load the associcated records and format the results appropriately, for example using result formatters, see for example CakePHP Entity contain without foreign key


Select strategy and reversed ordering

Just for reference, one of the weird solutions I stumbled over initially. This one really shouldn't be used!

This will select all associated abstracts, and then the ORM will iterate over them and for each article pick the first one with a matching article_id value. So in theory, when ordered descing on points, the ORM should pick the one with he most points.

While I would have expected this to work out of the box, it seems that the ORM iterates over the results in reversed order, which will cause the wrong rows to be picked. In order to get this working, the query needs to use the opposite order that would normally need to be used, ie ASC instead of DESC.

$this->hasOne('TopAbstracts', [
    'className' => 'Abstracts',
    'foreignKey' => 'abstract_id',
    'strategy' => 'select',
    'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
        $query->order(['TopAbstracts.points' => 'ASC']);
        return [];
    }
]);

Also the function needs to return an empty array instead of the expression like shown in the linked answer, as this will cause invalid SQL to be compiled. Both of these behaviours, the reversed order iterating and the invalid SQL might be bugs.

While this will work, it will always select all associated abstracts, not only the top ones, which might be considered rather inefficient, and look something like

SELECT
    Articles.id AS `Articles__id`, ...
FROM
    articles Articles
SELECT
    TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
    abstracts TopAbstracts
WHERE
    TopAbstracts.article_id in (1,2,3,4,5,6,7,8, ...)
ORDER BY
    TopAbstracts.points ASC

HasMany associations

I gave HasMany associations a try, but I'm too busy right now to pursue this further... just put together a MySQL specific custom association for testing purposes, based on ROW_NUMBER() emulation similar to MySQL select top X records for each individual in table.

If anyone's interested, check https://gist.github.com/ndm2/039da4009df1c5bf1c262583603f8298


BelongsToMany associations

Here's an example for BelongsToMany associations that makes use of native window functions, unfortunately CakePHP doesn't yet support common table expressions: https://gist.github.com/ndm2/b417e3fa683a972e295dc0e24ef515e3.



来源:https://stackoverflow.com/questions/30241975/how-to-limit-contained-associations-per-record-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!