Least used unicode delimiter

冷暖自知 提交于 2019-12-08 08:19:10

问题


I'm trying to tag my text with a delimiter at specific places that will be used later for parsing. I want to use a delimiter character that is least frequently used. I'm currently looking at the "\2" or the U+0002 character. Is that safe enough to use? What other suggestions are there? The text is unicode and will have both english and non-english characters.

A want to use a character that can still be "exploded()" by PHP.

Edit:

Also I want to be able to display this piece of text on screen (to the browser) and the delimiter will be "invisible" to the user. I can definitely use a str_replace() to get rid of visible delimiters, but if there are good invisible delimiters, then no such processing is needed.


回答1:


If this is only for an internal representation (i.e. not for interchange and storage), then you can use a non-character code point such as U+FFFF. Java uses that as the signal that a CharacterIterator is done, for example.



来源:https://stackoverflow.com/questions/6493956/least-used-unicode-delimiter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!