So I\'m trying to convert a binary to a string. This code:
t = [{<<71,0,69,0,84,0>>}]
String.from_char_list(t)
But I\'m gettin
Not sure if OP has since solved his problem, but in relation to his remark about his binary being utf16-le
: for specifically that encoding, I found that the quickest (and to those more experienced with Elixir, probably-hacky) way was to use Enum.reduce
:
# coercing it into utf8 gives us ["D", <<0>>, "e", <<0>>, "v", <<0>>, "a", <<0>>, "s", <<0>>, "t", <<0>>, "a", <<0>>, "t", <<0>>, "o", <<0>>, "r", <<0>>]
<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0>>
|> String.codepoints()
|> Enum.reduce("", fn(codepoint, result) ->
<< parsed :: 8>> = codepoint
if parsed == 0, do: result, else: result <> <<parsed>>
end)
# "Devastator"
|> IO.puts()
Assumptions:
utf16-le
encoding
the codepoints are backwards-compatible with utf8
i.e. they use only 1 byte
Since I'm still learning Elixir, it took me a while to get to this solution. I looked into other libraries people made, even using something like iconv
at a bash level.
I made a function to convert binary to string
def raw_binary_to_string(raw) do
codepoints = String.codepoints(raw)
val = Enum.reduce(codepoints,
fn(w, result) ->
cond do
String.valid?(w) ->
result <> w
true ->
<< parsed :: 8>> = w
result <> << parsed :: utf8 >>
end
end)
end
Executed on iex console
iex(6)>raw=<<65, 241, 111, 32, 100, 101, 32, 70, 97, 99, 116, 117, 114, 97, 99, 105, 111, 110, 32, 65, 99, 116, 117, 97, 108>>
iex(6)>raw_binary_to_string(raw)
iex(6)>"Año de Facturacion Actual"
There's a couple of things here:
1.) You have a list with a tuple containing one element, a binary. You can probably just extract the binary and have your string. Passing the current data structure to to_string
is not going to work.
2.) The binary you used in your example contains 0
, an unprintable character. In the shell, this will not be printed properly as a string, due to the fact that Elixir can't tell the difference between just a binary, and a binary representing a string, when the binary representing a string contains unprintable characters.
3.) You can use pattern matching to convert a binary to a particular type. For instance:
iex> raw = <<71,32,69,32,84,32>>
...> Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>)
"G E T "
...> <<c::utf8, _::binary>> = raw
"G"
Also, if you are getting binary data from a network connection, you probably want to use :erlang.iolist_to_binary
, since the data will be an iolist, not a charlist. The difference is that iolists can contain binaries, nested lists, as well as just be a list of integers. Charlists are always just a flat list of integers. If you call to_string
, on an iolist, it will fail.
In reference to http://erlang.org/pipermail/erlang-questions/2010-December/054885.html
You can use :unicode.characters_to_list(binary_string, {:utf16, :little})
to verify result and store too
IEX eval
iex(1)> y
<<115, 0, 121, 0, 115, 0>>
iex(2)> :unicode.characters_to_list(y, {:utf16, :little})
'sys'
Note : Value printed as sys
for <<115, 0, 121, 0, 115, 0>>
The last point definitely does change the issue, and explains it. Elixir uses binaries as strings but assumes and demands that they are UTF8 encoded, not UTF16.
You can use Comprehensions
defmodule TestModule do
def convert(binary) do
for c <- binary, into: "", do: <<c>>
end
end
TestModule.convert([71,32,69,32,84,32]) |> IO.puts