How to find out whether collation uses word sort or string sort?

前端 未结 2 1470
耶瑟儿~
耶瑟儿~ 2021-01-19 05:47

https://stackoverflow.com/a/361059/14731 discusses the differences between \"word sort\" and \"string sort\".

How does one query programmatically when an SQL

2条回答
  •  轮回少年
    2021-01-19 06:35

    • srutzky's excellent answer reveals that, with the exception of non-Unicode types processed by SQL_ collators, all other data is sorted according to "Unicode Collation" rules.
    • Confusingly, Microsoft does not use the Unicode standard's sorting rules.
    • According to https://support.microsoft.com/en-us/kb/322112

      SQL Server 2000 supports two types of collations:

      • SQL collations
      • Windows collations

      [...]

      For a Windows collation, a comparison of non-Unicode data is implemented by using the same algorithm as Unicode data.

      [...]

      A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules.

    • I interpret this as meaning that:

      • SQL_ collators are "SQL collations"
      • All other collators are "Windows collators".
      • With the exception of non-Unicode types processed by SQL_ collators, all other data is sorted according to "Windows collations".

    So, let's dig into "Windows collations".

    • According to https://msdn.microsoft.com/en-us/library/ms143515(v=sql.105).aspx

      For Unicode data types, data comparisons are based on the Unicode code points.

    • winnls.h contains a brief overview of "word sort":
    //  Sorting Flags.
    //
    //    WORD Sort:    culturally correct sort
    //                  hyphen and apostrophe are special cased
    //                  example: “coop” and “co-op” will sort together in a list
    //
    //                        co_op     <——-  underscore (symbol)
    //                        coat
    //                        comb
    //                        coop
    //                        co-op     <——-  hyphen (punctuation)
    //                        cork
    //                        went
    //                        were
    //                        we’re     <——-  apostrophe (punctuation)
    //
    //
    //    STRING Sort:  hyphen and apostrophe will sort with all other symbols
    //
    //                        co-op     <——-  hyphen (punctuation)
    //                        co_op     <——-  underscore (symbol)
    //                        coat
    //                        comb
    //                        coop
    //                        cork
    //                        we’re     <——-  apostrophe (punctuation)
    //                        went
    //                        were
    
    • And finally, according to https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx

      [...] all punctuation marks and other nonalphanumeric characters, except for the hyphen and the apostrophe, come before any alphanumeric character. The hyphen and the apostrophe are treated differently from the other nonalphanumeric characters to ensure that words such as "coop" and "co-op" stay together in a sorted list.

提交回复
热议问题