UTF-8 problems in php: var_export() returns \\0 null characters, and ucfirst(), strtoupper(), etc. behave strangely

风流意气都作罢 提交于 2019-12-03 06:36:43
hakre

I suggest you verify the PHP binary you've got problems with. Check the compiler flags and the libraries it makes use of.

Normally PHP internally uses binary strings, which means that functions like ucfirst work byte-to-byte and only support what your locale support (if and like configured). See Details of the String TypeDocs.

$ php -r "echo ucfirst('ñu');" 

returns

?u

This makes sense, ñ is

LATIN SMALL LETTER N WITH TILDE (U+00F1)    UTF8: \xC3\xB1

You have some locale configured that makes PHP change \xC3 into something else, breaking the UTF-8 byte-sequence and making your shell display the � replacement characterWikipedia.

I suggest if you really want to analyze the issues, you should start with hexdumps next to how things get displayed in shell and elsewhere. Know that you can explicitly define binrary strings b"string" (that's forward compatibility, mabye you've got enabled some compile flag and you're on unicode experimental?), and also you can write strings literally, here hex-way for UTF-8:

 $ php -r "echo ucfirst(b\"\\xC3\\xB1u\");"

And there are a lot more settings that can play a role, I started to list some points in an answer to Preparing PHP application to use with UTF-8.


Example of a multibyte ucfirst variant:

/**
 * multibyte ucfirst
 *
 * @param string $str
 * @param string|null $encoding (optional)
 * @return string
 */
function mb_ucfirst($str, $encoding = NULL)
{
    $first = mb_substr($str, 0, 1, $encoding);
    $rest = mb_substr($str, 1, strlen($str), $encoding);
    return mb_strtoupper($first, $encoding) . $rest;
}

See mb_strtoupperDocs and as well mb_convert_caseDocs.

try force utf-8 in php:

<? ini_set( 'default_charset', 'UTF-8' ); ?>

in very top (first line of code) of your any page/template. It helps me with my special characters mostly. Not sure that it can help you too, try it.

sakhunzai

Probably all your servers are in good state . In one of the comments you said that you have only issue with ucfirst() and var_export(). Depending on these responses you might be looking at this SOQ. Most of the php string function will not work properly when working with multibyte strings. That is why php has separate set of functions to deal with them.

This might be helpful

I normally use utf8_encode('ñu') for all the french characters

phpunit tests for this are being added to https://gist.github.com/68f5781a83a8986b9d30 - can we build up a better unit test suite so that we can figure out what the expected output should be?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!