Our column is currently collated to latin1_swedish_ci
and special unicode characters are, obviously, getting stripped out. We want to be able to accept chars su
The collation is the least of your worries, what you need to think about is the character set for the column/table/database. The collation (rules governing how data is compared and sorted) is just a corollary of that.
MySQL supports several Unicode character sets, utf8
and utf8mb4
being the most interesting. utf8
supports Unicode characters in the BMP, i.e. a subset of all of Unicode. utf8mb4
, available since MySQL 5.5.3, supports all of Unicode.
The collation to be used with any of the Unicode encodings is most likely xxx_general_ci
or xxx_unicode_ci
. The former is a general sorting and comparison algorithm independent of language, the latter is a more complete language independent algorithm supporting more Unicode features (e.g. treating "ß" and "ss" as equivalent), but is therefore also slower.
See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html.