Working with files and utf8 in PHP

后端 未结 3 1406
无人共我
无人共我 2020-12-16 03:18

Lets say I have a file called foo.txt encoded in utf8:

aoeu  
qjkx
ñpyf

And I want to get an array that contains all the lines in that file

3条回答
  •  悲&欢浪女
    2020-12-16 03:54

    In UTF-8, ñ is encoded as two bytes. Normally in PHP all string operations are byte-based, so when you preg_split the input it splits up the first byte and the second byte into separate array items. Neither the first byte on its own nor the second byte on its own will match both bytes together as found in $allowed_letters, so it'll never match ñ.

    As Yanick posted, the solution is to add the u modifier. This makes PHP's regex engine treat both the pattern and the input line as Unicode characters instead of bytes. It's lucky that PHP has special Unicode support here; elsewhere PHP's Unicode support is extremely spotty.

    A simpler and quicker way than splitting would be to compare each line against a character-group regex. Again, this must be a u regex.

    if(preg_match('/^[aoeuñpyf]+$/u', $line))
        $lines[]= $line;
    

提交回复
热议问题