Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded strings as arguments?

前端未结

关注

 5  1506

孤街浪徒 2020-12-11 00:40

PHP\'s str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it\'s binary-safe would it work properly if it

5条回答

隐瞒了意图╮ (楼主)

2020-12-11 01:11

Yes. UTF-8 is deliberately designed to allow this and other similar non-Unicode-aware processing.

In UTF-8, any non-ASCII byte sequence representing a valid character always begins with a byte in the range \xC0-\xFF. This byte may not appear anywhere else in the sequence, so you can't make a valid UTF-8 sequence that matches part of a character.

This is not the case for older multibyte encodings, where different parts of a byte sequence are indistinguishable. This caused a lot of problems, for example trying to replace an ASCII backslash in a Shift-JIS string (where byte \x5C might be the second byte of a character sequence representing something else).

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...