Why does SELECT results differ between mysql and sqlite?

匿名 (未验证) 提交于 2019-12-03 02:56:01

问题:

I'm re-asking this question in a simplified and expanded manner.

Consider these sql statements:

create table foo (id INT, score INT);  insert into foo values (106, 4); insert into foo values (107, 3); insert into foo values (106, 5); insert into foo values (107, 5);  select T1.id, avg(T1.score) avg1 from foo T1 group by T1.id having not exists (     select T2.id, avg(T2.score) avg2     from foo T2     group by T2.id     having avg2 > avg1); 

Using sqlite, the select statement returns:

id          avg1       ----------  ---------- 106         4.5        107         4.0        

and mysql returns:

+------+--------+ | id   | avg1   | +------+--------+ |  106 | 4.5000 | +------+--------+ 

As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:

select T1.id, cast(avg(cast(T1.score as real)) as real) avg1 from foo T1 group by T1.id having not exists (     select T2.id, cast(avg(cast(T2.score as real)) as real) avg2     from foo T2     group by T2.id     having avg2 > avg1); 

Why does sqlite return two records?

Quick update:

I ran the statement against the latest sqlite version (3.7.11) and still get two records.

Another update:

I sent an email to sqlite-users@sqlite.org about the issue.

Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).

To have three avg groups, I used the following statements:

create table foo (id VARCHAR(1), score INT);  insert into foo values ('c', 1.5); insert into foo values ('b', 5.0); insert into foo values ('a', 4.0); insert into foo values ('a', 5.0);  PRAGMA vdbe_listing = 1; PRAGMA vdbe_trace=ON;  select avg(score) avg1 from foo group by id having not exists (     select avg(T2.score) avg2     from foo T2     group by T2.id     having avg2 > avg1); 

We clearly see that somehow what should be r:4.5 has become i:5:

I'm now trying to see why that is.

Final edit:

So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:

http://www.sqlite.org/src/info/430bb59d79

Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:

CREATE TABLE t34(x,y); INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);   

回答1:

I tried to mess with some variants of query.

It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.

In your example avg1 under second having is always equal to 5.0

Look:

select T1.id, avg(T1.score) avg1 from foo T1 group by T1.id having not exists (     SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0); 

This one returns nothing, but execution of the following query returns both records:

... having not exists (     SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0); 

I can not find any similar bug at sqlite tickets list.



回答2:

Lets look at this two ways, i'll use postgres 9.0 as my reference database

(1)

-- select rows from foo   select T1.id, avg(T1.score) avg1 from foo T1 group by T1.id -- where we don't have any rows from T2 having  not exists ( -- select rows from foo select T2.id, avg(T2.score) avg2 from foo T2 group by T2.id -- where the average score for any row is greater than the average for  -- any row in T1 having avg2 > avg1);   id  |        avg1         -----+--------------------  106 | 4.5000000000000000 (1 row) 

then let's move some of the logic inside the subquery, getting rid of the 'not' : (2)

-- select rows from foo  select T1.id, avg(T1.score) avg1 from foo T1 group by T1.id -- where we do have rows from T2 having  exists ( -- select rows from foo select T2.id, avg(T2.score) avg2 from foo T2 group by T2.id -- where the average score is less than or equal than the average for any row in T1 having avg2 <= avg1); -- I think this expression will be true for all rows as we are in effect doing a --cartesian join  -- with the 'having' only we don't display the cartesian row set   id  |        avg1         -----+--------------------  106 | 4.5000000000000000  107 | 4.0000000000000000 (2 rows) 

so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.

if you want every row that is less than the maximum average What you should be saying is:

select T1.id, avg(T1.score) avg1  from foo T1 group by T1.id having avg1 not in  (select max(avg1) from (select id,avg(score) avg1 from foo group by id))  


回答3:

Have you tried this version? :

select T1.id, avg(T1.score) avg1 from foo T1 group by T1.id having not exists (     select T2.id, avg(T2.score) avg2     from foo T2     group by T2.id     having avg(T2.score) > avg(T1.score)); 

Also this one (which should be giving same results):

select T1.* from   ( select id, avg(score) avg1     from foo      group by id   ) T1 where not exists (     select T2.id, avg(T2.score) avg2     from foo T2     group by T2.id     having avg(T2.score) > avg1); 

The query can also be handled with derived tables, instead of subquery in HAVING clause:

select ta.id, ta.avg1 from    ( select id, avg(score) avg1     from foo     group by id   ) ta   JOIN   ( select avg(score) avg1     from foo      group by id     order by avg1 DESC     LIMIT 1   ) tmp   ON tmp.avg1 = ta.avg1  


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!