I\'m working on a project where I need to convert text from an encoding (for example Windows-1256 Arabic) to UTF-8.
How do I do this in Go?
I checked out the docs, here, and I came up with a way to convert an array of bytes to (or from) UTF-8.
What I have a hard time with is that, so far, I've not found an interface that would allow me to use a locale. Instead, it's like the possible ways are limited to predefined sets of encodings.
In my case, I needed to convert UTF-16 (really I have USC-2 data, but it should still work) to UTF-8. To do that, I needed to check for the BOM and then do the conversion:
bom := buf[0] + buf[1] * 256
if bom == 0xFEFF {
enc = unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
} else if bom == 0xFFFE {
enc = unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
} else {
return Error("BOM missing")
}
e := enc.NewDecoder()
// convert USC-2 (LE or BE) to UTF-8
utf8 := e.Bytes(buf[2:])
Unfortunate that I have to use "ignore" BOM since in my case it should instead be forbidden past the first character. But that's close enough for my situation. These functions were mentioned in a couple of places, but not shown in practice.