What's the difference between utf8_general_ci and utf8_unicode_ci?

后端 未结 8 1409
暗喜
暗喜 2020-11-22 01:38

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?

8条回答
  •  我在风中等你
    2020-11-22 01:53

    There are two big difference the sorting and the character matching:

    Sorting:

    • utf8mb4_general_ci removes all accents and sorts one by one which may create incorrect sort results.
    • utf8mb4_unicode_ci sorts accurate.

    Character Matching

    They match characters differently.

    For example, in utf8mb4_unicode_ci you have i != ı, but in utf8mb4_general_ci it holds ı=i.

    For example, imagine you have a row with name="Yılmaz". Then

    select id from users where name='Yilmaz';
    

    would return the row if collocation is utf8mb4_general_ci, but if it is collocated with utf8mb4_unicode_ci it would not return the row!

    On the other hand we have that a=ª and ß=ss in utf8mb4_unicode_ci which is not the case in utf8mb4_general_ci. So imagine you have a row with name="ªßi", then

    select id from users where name='assi';
    

    would return the row if collocation is utf8mb4_unicode_ci, but would not return a row if collocation is set to utf8mb4_general_ci.

    A full list of matches for each collocation may be found here.

提交回复
热议问题