How do I Sort a varchar with numbers and letters without a specific format?

坚强是说给别人听的谎言 提交于 2020-01-06 07:29:38

问题


I have a column named MR which is a varchar. When I run a query with an ORDER BY it doesn't seem to be ordered correctly.

select MR, LName, FName 
from users
order by MR

Results:

MR        | LNAME | FNAME
----------+-------+-------
1234-234  | HEN   | LO
2343MA2   | SY    | JACK
MR20001   | LINA  | MARY
MR200011  | TEST  | CASE
MR20002   | KO    | MIKE

Why does MR200011 show before MR20002? Any Idea guys on how I can properly sort this? The format of MR is not fixed.


回答1:


You are sorting by string, not by the value of the number. The character in position 7 is the difference that's being compared:

MR200011 
MR20002 
      ^

And because '2' > '1', this is the order you end up with. The 8th character is never compared, because the character-based sort order doesn't depend on it.

To 'fix' this issue, create a stored function which takes your varchar value, and returns a new 'sort string' which pads the numeric components to a fixed length.

e.g.

MR20002  -> MR0020002
MR200011 -> MR0200011

but more importantly, if you have two blocks of numbers, they don't become corrupted:

A1234-234  -> A000000001234-000000000234
A1234-5123 -> A000000001234-000000005123

The following function performs this transformation on sql-server - you'd have to adapt this function for mysql:

create function dbo.get_numeric_sort_key(@value varchar(100)) 
    returns varchar(200)
as
begin
   declare @pad_characters varchar(12)
   declare @numeric_block varchar(12)
   declare @output varchar(200)
   set @pad_characters = '000000000000'
   set @output = ''
   set @numeric_block = ''

   declare @idx int
   declare @len int
   declare @char char(1)
   set @idx = 1
   set @len = len(@value)
   while @idx <= @len
   begin
     set @char = SUBSTRING(@value, @idx, 1)
     if @char in ('0','1','2','3','4','5','6','7','8','9') 
     begin
        set @numeric_block = @numeric_block + @char
     end
     else
     begin
        if (@numeric_block <> '')
        begin
          set @output = @output + right(@pad_characters + @numeric_block, 12)
          set @numeric_block = ''
        end
        set @output = @output + @char
     end
     set @idx = @idx + 1
   end

   if (@numeric_block <> '')
     set @output = @output + right(@pad_characters + @numeric_block, 12)

   return @output
end

Then change your order by clause to use the new function:

select MR, LName, FName 
from users 
order by dbo.get_numeric_sort_key(MR)

If you have a large amount of data, it would be worth adding a calculated field to the end of your table definition (populated by this function) so that you don't have to do a scan every time you run this query.




回答2:


The combination of number and alphabets sorts correctly only when the length of all the entries are fixed. In your case, the length of MR200011 and MR20002 are not equal and sorting is done based on MR200011 MR20002? The 8th Character is missing




回答3:


Maybe this query doesn't look really nice, but it will sort the rows in the order you want:

select
  MR,
  LName,
  FName
from (
  select
    MR,
    LName,
    FName,
    least(
      case when locate('0', MR)>0 then locate('0', MR) else length(MR)+1 end,
      case when locate('1', MR)>0 then locate('1', MR) else length(MR)+1 end,
      case when locate('2', MR)>0 then locate('2', MR) else length(MR)+1 end,
      case when locate('3', MR)>0 then locate('3', MR) else length(MR)+1 end,
      case when locate('4', MR)>0 then locate('4', MR) else length(MR)+1 end,
      case when locate('5', MR)>0 then locate('5', MR) else length(MR)+1 end,
      case when locate('6', MR)>0 then locate('6', MR) else length(MR)+1 end,
      case when locate('7', MR)>0 then locate('7', MR) else length(MR)+1 end,
      case when locate('8', MR)>0 then locate('8', MR) else length(MR)+1 end,
      case when locate('9', MR)>0 then locate('9', MR) else length(MR)+1 end) pos
  from users
  ) users_pos
order by
  left(MR, pos-1),
  mid(MR, pos, length(MR)-pos+1)+0

in the subquery users_pos I'm calculating the first position of a digit, I'm then ordering by left(MR, pos-1) which is the non-numeric beginning of the string, and by mid(MR, pos, length(MR)-pos+1)+0 which is the numeric part of the string, adding 0 will be converted to number and ordered as a number (so 20002 comes before 200011).

See it working here.



来源:https://stackoverflow.com/questions/14497692/how-do-i-sort-a-varchar-with-numbers-and-letters-without-a-specific-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!