I am new to multilingual data and my confession is that I never did tried it before. Currently I am working on a multilingual site, but I do not know which language will be
You should use a Unicode collation. You can set it by default on your system, or on each field of your tables. There are the following Unicode collation names, and this are their differences:
utf8_general_ci is a very simple collation. It just - removes all accents - then converts to upper case and uses the code of this sort of "base letter" result letter to compare.
utf8_unicode_ci uses the default Unicode collation element table.
The main differences are:
utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in the wrong order.
+/- The disadvantage of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci.
So depending on, if you know or not, which specific languages/characters you are going to use I do recommend that you use utf8_unicode_ci which has a more ample coverage.
Extracted from MySQL forums.