Why is this false:
iex(1)> String.match?("汉语漢語", ~r/^[[:alpha:]]+$/)
false
But this is true?:
When you pass the string to the regex in a non-Unicode mode, it is treated as an array of bytes, not as a Unicode string. See IO.puts byte_size("汉语漢語") (12, all bytes that the input consists of: 230,177,137,232,175,173,230,188,162,232,170,158) and IO.puts String.length("汉语漢語") (4, the Unicode "letters") difference. There are bytes in the string that cannot be matched with the [:alpha:] POSIX character class. Thus, the first expression does not work, while the second works as it only needs 1 character to return a valid match.
To properly match Unicode strings with PCRE regex library (that is used in Elixir), you need to enable the Unicode mode with /u modifier:
IO.puts String.match?("汉语漢語", ~r/^[[:alpha:]]+$/u)
See the IDEONE demo (prints true)
See Elixir regex reference:
unicode (u)- enables unicode specific patterns like\pand changes modifiers like\w,\W,\sand friends to also match on unicode. It expects valid unicode strings to be given on match.