Why `strchr` seems to work with multibyte characters, despite man page disclaimer?

江枫思渺然 提交于 2019-12-04 10:20:57
Richard Pennington

strchr() only seems to work for your multi-byte character.

The actual string in memory is

... c, o, n, t, a, i, n, s, ' ', 0xC3, 0xA9, ' ', w ...

When you call strchr(), you are really only searching for the 0xA9, which are the lower 8 bits. That's why pos[-1] has the first byte of your multi-byte character: it was ignored during the search.

A char is signed on your system, which is why your characters are sign extended (the 0xFFFFFF) when you print them out.

As for the warning, it seems that the compiler is trying to tell you that you are doing something odd, which you are. Don't ignore it.

That's the problem. It seems to work. Firstly, it's entirely up to the compiler what it puts in the string if you put multibyte characters in it, if indeed it compiles it at all. Clearly you are lucky (for some appropriate interpretation of lucky) in that it has filled your string with

.... c3, a9, ' ', 'w', etc

and that you are looking for c3a9, as it can find that fairly easily. The man page on strchr says:

The strchr() function returns a pointer to the first occurrence of c (converted to a char) in string s

So you pass c3a9 to this, which is converted to a char with value 'a9'. It finds the a9 character, and you get returned a pointer to it.

The ffffff prefix is because you are outputting a signed character as a 32 bit hex number, so it sign extends it for you. This is as expected.

The problem is that 'undefined behaviour' is just that. It might work almost correctly. And it might not, depending on circumstances.

And again it is almost. You are not getting a pointer to the multibyte character, you are getting a pointer to the middle of it, (and I'm surprised you're interpreting that as working). If the multibyte character had evaluated to 0xff20 you'd get pointed to somewhere much earlier in the string.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!