Encoding emoji in Erlang

拜拜、爱过 提交于 2019-12-11 19:19:39

问题


Assuming I have a binary

Message = <<"string containing emoji">>.

How do I properly encode it in Unicode? I tried doing:

Encoded = <<Message/utf16>>.

I get this warning when compiling the file:

Warning: binary construction will fail with a 'badarg' exception (invalid Unicode code point in a utf8/utf16/utf32 segment)

I tried this with /utf8 as well. Same warning.


回答1:


Assuming that the binary you start with is encoded according to UTF-8, and you need to encode it as little-endian UTF-16, this should work:

unicode:characters_to_binary(<<"string containing emoji">>, utf8, {utf16, little})

See the documentation for the Unicode module for more information.

The reason why <<Message/utf16>> fails is that the utf8, utf16 and utf32 specifiers in bit syntax encode a single codepoint, not an entire string. So to encode the character U+1F64C, you could use:

2> <<16#1f64c/utf8>>.
<<240,159,153,140>>
3> <<16#1f64c/utf16>>.
<<"\330=\336L">>
4> <<16#1f64c/utf32>>.
<<0,1,246,76>>



回答2:


You may need to add -*- coding: utf8 -*- as the first line of your module, and use /utf8.

My guess is that you are using Erlang/OTP < 17, meaning files are considered latin-1 unless specified otherwise.



来源:https://stackoverflow.com/questions/22188529/encoding-emoji-in-erlang

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!