What is the best MySQL collation for German language

后端 未结 3 1328
甜味超标
甜味超标 2020-12-01 05:27

I am building a web site in German language, So I will be using characters like ä, ü, ß etc., So what are your recommendations?

3条回答
  •  时光取名叫无心
    2020-12-01 05:57

    To support the complete UTF-8 standard you have to use the charset utf8mb4 and the collation utf8mb4_unicode_ci in MySQL!

    Note: MySQL only supports 1- to 3-byte characters when using its so called utf8 charset! This is why the modern Emojis are not supported as they use 4 Bytes!

    The only way to fully support the UTF-8 standard is to change the charset and collation of ALL tables and of the database itself to utf8mb4 and utf8mb4_unicode_ci. Further more, the database connection needs to use utf8mb4 as well.

    The mysql server must use utf8mb4 as default charset which can be manually configured in /etc/mysql/conf.d/mysql.cnf

    [client]
    default-character-set = utf8mb4
    
    [mysql]
    default-character-set = utf8mb4
    
    [mysqld]
    # character-set-client-handshake = FALSE  ## better not set this!
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    

    Existing tables can be migrated to utf8mb4 using the following SQL statement:

    ALTER TABLE  CONVERT TO 
    CHARACTER SET utf8mb4 
    COLLATE utf8mb4_unicode_ci;
    

    Note:

    • To make sure any JOINs between table-colums will not be slowed down by charset-encodings ALL tables have to be change!
    • As the length of an index is limited in MySQL, the total number of characters per index-row must be multiplied by 4 Byte and need to be smaller than 3072

    When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.

    To change the charset and default collation of the database, run this command:

    ALTER DATABASE CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    

    Since utf8mb4 is fully backwards compatible with utf8, no mojibake or other forms of data loss should occur.

提交回复
热议问题