How do I load a UTF16-encoded text file in Julia?

徘徊边缘 提交于 2020-03-01 03:31:24

问题


I have a text file I am (pretty sure) is encoded in UTF16, but I don't know how to load it in Julia. Do I have to load it as bytes and then convert with UTF16String?


回答1:


The simplest way is to read it as bytes and then convert:

s = open(filename, "r") do f
    utf16(readbytes(f))
end

Note that utf16 also checks for a byte-order-mark (BOM), so it will deal with endianness issues and won't include the BOM in the resulting s.

If you really want to avoid making a copy of the data, and you know it is native-endian, this is possible too, but you have to explicitly write a NUL terminator (since Julia UTF-16 string data internally has a NUL codepoint at the end for passing to C routines that expect NUL-terminated data):

s = open(filename, "r") do f
    b = readbytes(f)
    resize!(b, length(b)+2)
    b[end] = b[end-1] = 0
    UTF16String(reinterpret(UInt16, b))
end

However, typical UTF-16 text files will start with a BOM, and in this case the string s will include the BOM as its first character, which may not be what you want.



来源:https://stackoverflow.com/questions/30061521/how-do-i-load-a-utf16-encoded-text-file-in-julia

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!