I\'m trying to invert a string in go but I\'m having trouble handling the characters. Unlike C, GO treats strings as vectors of bytes, rather than characters, which are call
If you want to take into account unicode combining characters, Andrew Sellers hs an interesting take in this gist.
It starts by listing the Unicode block range for all combining diacritical marks (CDM)
var combining = &unicode.RangeTable{
R16: []unicode.Range16{
{0x0300, 0x036f, 1}, // combining diacritical marks
{0x1ab0, 0x1aff, 1}, // combining diacritical marks extended
{0x1dc0, 0x1dff, 1}, // combining diacritical marks supplement
{0x20d0, 0x20ff, 1}, // combining diacritical marks for symbols
{0xfe20, 0xfe2f, 1}, // combining half marks
},
}
You can then read, rune after rune, your initial string:
sv := []rune(s)
But if you do so in reverse order, you will encounter combining diacritical marks (CDMs) first, and those need to preserve their order, to not be reversed
for ix := len(sv) - 1; ix >= 0; ix-- {
r := sv[ix]
if unicode.In(r, combining) {
cv = append(cv, r)
fmt.Printf("Detect combining diacritical mark ' %c'\n", r)
}
(note the space around the %c
combining rune: '%c'
without space would means combining the mark with the first 'ͤ'
: instead of ' ͤ '. I tried to use the CGJ Combining Grapheme Joiner \u034F
, but that does not work)
If you encounter finally a regular rune, you need to combine with those CDMs, before adding it to your reverse final rune array.
} else {
rrv := make([]rune, 0, len(cv)+1)
rrv = append(rrv, r)
rrv = append(rrv, cv...)
fmt.Printf("regular mark '%c' (with '%d' combining diacritical marks '%s') => '%s'\n", r, len(cv), string(cv), string(rrv))
rv = append(rv, rrv...)
cv = make([]rune, 0)
}
Where it gets even more complex is with emojis, and, for instance more recently, modifiers like the Medium-Dark Skin Tone, the type 5 on the Fitzpatrick Scale of skin tones.
If ignored, Reverse '
As you correctly identified, go strings are immutable, so you cannot assign to rune/character values at given indices.
Instead of reversing the string in-place one must create a copy of the runes in the string and reverse those instead, and then return the resulting string.
For example (Go Playground):
func reverse(s string) string {
rs := []rune(s)
for i, j := 0, len(rs)-1; i < j; i, j = i+1, j-1 {
rs[i], rs[j] = rs[j], rs[i]
}
return string(rs)
}
func main() {
fmt.Println(reverse("Hello, World!"))
// !dlroW ,olleH
fmt.Println(reverse("Hello, 世界!"))
// !界世 ,olleH
}
There are problems with this approach due to the intricacies of Unicode (e.g. combining diacritical marks) but this will get you started.