How to convert from an encoding to UTF-8 in Go?

前端 未结 4 802
臣服心动
臣服心动 2020-12-30 06:23

I\'m working on a project where I need to convert text from an encoding (for example Windows-1256 Arabic) to UTF-8.

How do I do this in Go?

4条回答
  •  余生分开走
    2020-12-30 07:03

    I checked out the docs, here, and I came up with a way to convert an array of bytes to (or from) UTF-8.

    What I have a hard time with is that, so far, I've not found an interface that would allow me to use a locale. Instead, it's like the possible ways are limited to predefined sets of encodings.

    In my case, I needed to convert UTF-16 (really I have USC-2 data, but it should still work) to UTF-8. To do that, I needed to check for the BOM and then do the conversion:

    bom := buf[0] + buf[1] * 256
    if bom == 0xFEFF {
        enc = unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
    } else if bom == 0xFFFE {
        enc = unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    } else {
        return Error("BOM missing")
    }
    
    e := enc.NewDecoder()
    
    // convert USC-2 (LE or BE) to UTF-8
    utf8 := e.Bytes(buf[2:])
    

    Unfortunate that I have to use "ignore" BOM since in my case it should instead be forbidden past the first character. But that's close enough for my situation. These functions were mentioned in a couple of places, but not shown in practice.

提交回复
热议问题