How To Convert An Elixir Binary To A String?

夙愿已清 提交于 2019-11-28 12:04:52
bitwalker

There's a couple of things here:

1.) You have a list with a tuple containing one element, a binary. You can probably just extract the binary and have your string. Passing the current data structure to to_string is not going to work.

2.) The binary you used in your example contains 0, an unprintable character. In the shell, this will not be printed properly as a string, due to the fact that Elixir can't tell the difference between just a binary, and a binary representing a string, when the binary representing a string contains unprintable characters.

3.) You can use pattern matching to convert a binary to a particular type. For instance:

iex> raw = <<71,32,69,32,84,32>>
...> Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>)
"G E T "
...> <<c::utf8, _::binary>> = raw
"G"

Also, if you are getting binary data from a network connection, you probably want to use :erlang.iolist_to_binary, since the data will be an iolist, not a charlist. The difference is that iolists can contain binaries, nested lists, as well as just be a list of integers. Charlists are always just a flat list of integers. If you call to_string, on an iolist, it will fail.

user701847

Not sure if OP has since solved his problem, but in relation to his remark about his binary being utf16-le: for specifically that encoding, I found that the quickest (and to those more experienced with Elixir, probably-hacky) way was to use Enum.reduce:

# coercing it into utf8 gives us ["D", <<0>>, "e", <<0>>, "v", <<0>>, "a", <<0>>, "s", <<0>>, "t", <<0>>, "a", <<0>>, "t", <<0>>, "o", <<0>>, "r", <<0>>]
<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0>>  
|> String.codepoints()
|> Enum.reduce("", fn(codepoint, result) ->
                     << parsed :: 8>> = codepoint
                     if parsed == 0, do: result, else: result <> <<parsed>>
                   end)

# "Devastator"
|> IO.puts()

Assumptions:

  • utf16-le encoding

  • the codepoints are backwards-compatible with utf8 i.e. they use only 1 byte

Since I'm still learning Elixir, it took me a while to get to this solution. I looked into other libraries people made, even using something like iconv at a bash level.

Andres Garcia

I made a function to convert binary to string

def raw_binary_to_string(raw) do
   codepoints = String.codepoints(raw)  
      val = Enum.reduce(codepoints, 
                        fn(w, result) ->  
                            cond do 
                                String.valid?(w) -> 
                                    result <> w 
                                true ->
                                    << parsed :: 8>> = w 
                                    result <>   << parsed :: utf8 >>
                            end
                        end)

  end

Executed on iex console

iex(6)>raw=<<65, 241, 111, 32, 100, 101, 32, 70, 97, 99, 116, 117, 114, 97, 99, 105, 111, 110, 32, 65, 99, 116, 117, 97, 108>>
iex(6)>raw_binary_to_string(raw)
iex(6)>"Año de Facturacion Actual"

The last point definitely does change the issue, and explains it. Elixir uses binaries as strings but assumes and demands that they are UTF8 encoded, not UTF16.

In reference to http://erlang.org/pipermail/erlang-questions/2010-December/054885.html

You can use :unicode.characters_to_list(binary_string, {:utf16, :little}) to verify result and store too

IEX eval

iex(1)> y                                                
<<115, 0, 121, 0, 115, 0>>
iex(2)> :unicode.characters_to_list(y, {:utf16, :little})
'sys'

Note : Value printed as sys for <<115, 0, 121, 0, 115, 0>>

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!