Select top using SQL Server returns different output than select *

浪尽此生 提交于 2019-12-14 03:02:51

问题


I tried to get select top n data from a database based on alphabetical & numbering format. The output must order by alphabet first and number after that.

When I try to get all data (select *), I get the correct output:

select nocust, share 
from TB_STOCK
where share = ’BBCA’ 
  and concat(share, nocust) < ‘ZZZZZZZZ’
order by 
    case when nocust like ‘[a-z]%’ then 0 else 1 end


nocust | share
-------+--------
a522   | BBCA
b454   | BBCA
k007   | BBCA
p430   | BBCA
q797   | BBCA
s441   | BBCA
s892   | BBCA
u648   | BBCA
v107   | BBCA
4211   | BBCA
6469   | BBCA
6751   | BBCA

But when I try to select top n (ex : top 5), I get different output than expected (not like select * from table) :

select top 5 nocust, share 
from TB_STOCK
where share = ’BBCA’ 
  and concat(share, nocust) < ‘ZZZZZZZZ’
order by 
    case when nocust like ‘[a-z]%’ then 0 else 1 end

nocust | share
-------+--------
k007   | BBCA
b454   | BBCA
a522   | BBCA
p430   | BBCA
q797   | BBCA

I expect the mistake is somewhere between the concat and order by, can someone tell me how to get the right top 5 output like :

nocust | share
-------+--------
a522   | BBCA
b454   | BBCA
k007   | BBCA
p430   | BBCA
q797   | BBCA

回答1:


I am trying to answer this in different perspective.

First it should be clear that Optimizer make the best possible plan quickly.

Optimizer select index or do not select index in most cost effective manner.

I am using Adventure 2016 database and Production.Product has 504 rows.

select [Name],ProductNumber from Production.Product
order by [Name]

It sort the rows as expected.

select top 5 [Name],ProductNumber from Production.Product
order by [Name]

It sort the rows as expected.

If I use case statement in Order

select [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end

It sort the record as intended. All 504 rows are process.

If I use less than equal to 20% of total rows in Top like

select Top 5 [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end



Then it pick first n records and display n record quickly.
Sorting was not as expected.

If I use more 20% of total rows in Top like

select Top (101) [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end

It will process all 504 rows and sort accordingly.

Sorting result is as expected.

In all above case Clustered Index Scan (Product id) is done. In this example [Name]and ProductNumber are two different non clustered index.

But it was not selected.

You can do this,

;With CTE as(

select  nocust, share ,
case when nocust like ‘[a-z]%’ then 0 else 1 end SortCol
from TB_STOCK
where share = ’BBCA’ 
  and concat(share, nocust) < ‘ZZZZZZZZ’


)

select top 5 * from CTE
order by SortCol



回答2:


You have a very strange ORDER BY - it only makes sure entries with a letter at the beginning are ordered before those which have a number in the beginning - but you're NOT actually ordering by the values itself. No specific ORDER BY means: there's no guarantee as to how the rows will be ordered - as you're seeing here.

You need to adapt your ORDER BY to:

 ORDER BY
     CASE WHEN nocust LIKE '[a-z]%' THEN 1 ELSE 0 END,
     nocust

NOW you're actually ordering by nocust - and now, I'm pretty sure, the outputs will be identical




回答3:


Your ORDER BY is not a stable sort; it sorts data broadly into one of two categories but doesn't specify in enough detail how items are to then be sorted within the category. This means in the TOP 5 form sqlserver is free to choose a data access strategy that means it can easily stop after it has found 5 rows whose data is such that the case when returns 0

Suppose you have this output from SELECT * ... ORDER BY Category

Category, Thing
Animal, Cat
Animal, Dog
Animal, Goat
Vegetable, Potato
Vegetable, Turnip
Vegetable, Swede

There is absolutely no guarantee that if you do a SELECT TOP 2 * ... ORDER BY category that you will get "Cat, Dog" in that order. You could reasonably get "Goat, Dog" today and "Cat, Goat" tomorrow, when SQL server has shuffled its indexes around after new data was added. The only thing you can guarantee with a top 2 order by category is that, so long as there are at least two animals in the db, and there is no new category that is alphabetically earlier than "animal" you'll get two animals

Is it this way because an optimization of TOP N means that sqlserver can stop early once it has N rows that meet the criteria; it doesn't need to access and sort a million rows if it already found 5 rows that have a category that would be first in the sort. Let's imagine it can know the distinct values and the count of those values in the column as part of its statistics, it can sort those distinct values to know which ones will come first then go and find any 5 random rows that have a value that will sort first, and return them. Essentially sql server may think "I know I have 3 'animal', and animals come before everything else, and the user wants 2. I'll just start reading rows and stop after I get 2 animals" rather than "I'll read every Thing, sort all million of them on category, then take the first 2 rows"

This could be hugely faster than sorting a million rows then plucking the first X

To get repeatable results each time you have to make the sort stable by specify sort conditions that guarantee the Thing within the Category, will be sorted right the way down to where there is no ambiguity

Add more columns to your order by so that every row has a guaranteed place in the overall ordering and then your sort will be stable and TOP N will return the same rows each time. To make a sort stable the collection of columns you sort by has to have a unique combination of values. You could sort by 20 columns but if there are any rows where all 30 of those columns have identical values (and differentiation only occurs on the 21st value, which you don't order by) then the sort order isn't guaranteed



来源:https://stackoverflow.com/questions/58212106/select-top-using-sql-server-returns-different-output-than-select

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!