问题
I tried to get select top n
data from a database based on alphabetical & numbering format. The output must order by alphabet first and number after that.
When I try to get all data (select *
), I get the correct output:
select nocust, share
from TB_STOCK
where share = ’BBCA’
and concat(share, nocust) < ‘ZZZZZZZZ’
order by
case when nocust like ‘[a-z]%’ then 0 else 1 end
nocust | share
-------+--------
a522 | BBCA
b454 | BBCA
k007 | BBCA
p430 | BBCA
q797 | BBCA
s441 | BBCA
s892 | BBCA
u648 | BBCA
v107 | BBCA
4211 | BBCA
6469 | BBCA
6751 | BBCA
But when I try to select top n
(ex : top 5), I get different output than expected (not like select * from table
) :
select top 5 nocust, share
from TB_STOCK
where share = ’BBCA’
and concat(share, nocust) < ‘ZZZZZZZZ’
order by
case when nocust like ‘[a-z]%’ then 0 else 1 end
nocust | share
-------+--------
k007 | BBCA
b454 | BBCA
a522 | BBCA
p430 | BBCA
q797 | BBCA
I expect the mistake is somewhere between the concat and order by, can someone tell me how to get the right top 5 output like :
nocust | share
-------+--------
a522 | BBCA
b454 | BBCA
k007 | BBCA
p430 | BBCA
q797 | BBCA
回答1:
I am trying to answer this in different perspective.
First it should be clear that Optimizer make the best possible plan quickly
.
Optimizer select index or do not select index in most cost effective manner
.
I am using Adventure 2016 database
and Production.Product
has 504
rows.
select [Name],ProductNumber from Production.Product
order by [Name]
It sort the rows as expected.
select top 5 [Name],ProductNumber from Production.Product
order by [Name]
It sort the rows as expected.
If I use case statement in Order
select [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end
It sort the record as intended. All 504
rows are process.
If I use less than equal to 20% of total rows
in Top like
select Top 5 [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end
Then it pick first n records and display n record quickly.
Sorting was not as expected.
If I use more 20% of total rows
in Top like
select Top (101) [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end
It will process all 504 rows
and sort accordingly.
Sorting result is as expected.
In all above case Clustered Index Scan (Product id)
is done.
In this example [Name]and ProductNumber
are two different non clustered index
.
But it was not selected.
You can do this,
;With CTE as(
select nocust, share ,
case when nocust like ‘[a-z]%’ then 0 else 1 end SortCol
from TB_STOCK
where share = ’BBCA’
and concat(share, nocust) < ‘ZZZZZZZZ’
)
select top 5 * from CTE
order by SortCol
回答2:
You have a very strange ORDER BY
- it only makes sure entries with a letter at the beginning are ordered before those which have a number in the beginning - but you're NOT actually ordering by the values itself. No specific ORDER BY
means: there's no guarantee as to how the rows will be ordered - as you're seeing here.
You need to adapt your ORDER BY
to:
ORDER BY
CASE WHEN nocust LIKE '[a-z]%' THEN 1 ELSE 0 END,
nocust
NOW you're actually ordering by nocust
- and now, I'm pretty sure, the outputs will be identical
回答3:
Your ORDER BY is not a stable sort; it sorts data broadly into one of two categories but doesn't specify in enough detail how items are to then be sorted within the category. This means in the TOP 5 form sqlserver is free to choose a data access strategy that means it can easily stop after it has found 5 rows whose data is such that the case when
returns 0
Suppose you have this output from SELECT * ... ORDER BY Category
Category, Thing
Animal, Cat
Animal, Dog
Animal, Goat
Vegetable, Potato
Vegetable, Turnip
Vegetable, Swede
There is absolutely no guarantee that if you do a SELECT TOP 2 * ... ORDER BY category
that you will get "Cat, Dog" in that order. You could reasonably get "Goat, Dog" today and "Cat, Goat" tomorrow, when SQL server has shuffled its indexes around after new data was added. The only thing you can guarantee with a top 2 order by category is that, so long as there are at least two animals in the db, and there is no new category that is alphabetically earlier than "animal" you'll get two animals
Is it this way because an optimization of TOP N means that sqlserver can stop early once it has N rows that meet the criteria; it doesn't need to access and sort a million rows if it already found 5 rows that have a category that would be first in the sort. Let's imagine it can know the distinct values and the count of those values in the column as part of its statistics, it can sort those distinct values to know which ones will come first then go and find any 5 random rows that have a value that will sort first, and return them. Essentially sql server may think "I know I have 3 'animal', and animals come before everything else, and the user wants 2. I'll just start reading rows and stop after I get 2 animals" rather than "I'll read every Thing, sort all million of them on category, then take the first 2 rows"
This could be hugely faster than sorting a million rows then plucking the first X
To get repeatable results each time you have to make the sort stable by specify sort conditions that guarantee the Thing within the Category, will be sorted right the way down to where there is no ambiguity
Add more columns to your order by so that every row has a guaranteed place in the overall ordering and then your sort will be stable and TOP N will return the same rows each time. To make a sort stable the collection of columns you sort by has to have a unique combination of values. You could sort by 20 columns but if there are any rows where all 30 of those columns have identical values (and differentiation only occurs on the 21st value, which you don't order by) then the sort order isn't guaranteed
来源:https://stackoverflow.com/questions/58212106/select-top-using-sql-server-returns-different-output-than-select