Must UTF-8 binaries include /utf8 in the binary literal in Erlang?

我的未来我决定 提交于 2019-12-10 04:30:23

问题


In erlang, when defining a UTF-8 binary string, I need to specify the encoding in the binary literal, like this:

Star = <<"★"/utf8>>.
> <<226,152,133>>
io:format("~ts~n", [Star]).
> ★
> ok

But, if the /utf8 encoding is omitted, the unicode characters are not handled correctly:

Star1 = <<"★">>.
> <<5>>
io:format("~ts~n", [Star1]).
> ^E
> ok

Is there a way that I can create literal binary strings like this without having to specify /utf8 in every binary I create? My code has quite a few binaries like this and things have become quite cluttered. Is there a way to set some sort of default encoding for binaries?


回答1:


This is probably a result of the ambiguity of Erlang strings and lists. When you enter <<"★">>, what Erlang is actually seeing is <<[9733]>>, which, of course, is just a list containing an integer. As such, I believe Erlang in this case would encode 9733 as an integer, most likely with 16-bits (though I could certainly be wrong on that).

The /utf8 flag indicates to Erlang that this is supposed to be a UTF8 string, and thus gives a hint to the VM about how best to encode the integer it encounters.



来源:https://stackoverflow.com/questions/24315971/must-utf-8-binaries-include-utf8-in-the-binary-literal-in-erlang

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!