How to manipulate strings in GO to reverse them?

筅森魡賤 提交于 2020-08-26 09:08:59

问题


I'm trying to invert a string in go but I'm having trouble handling the characters. Unlike C, GO treats strings as vectors of bytes, rather than characters, which are called runes here. I tried to do some type conversions to do the assignments, but so far I could not.

The idea here is to generate 5 strings with random characters of sizes 100, 200, 300, 400 and 500 and then invert their characters. I was able to make C work with ease, but in GO, the language returns an error saying that it is not possible to perform the assignment.

 func inverte() {
    var c = "A"
    var strs, aux string

    rand.Seed(time.Now().UnixNano())
    // Gera 5 vetores de 100, 200, 300, 400, e 500 caracteres
    for i := 1; i < 6; i++ {
        strs = randomString(i * 100)
        fmt.Print(strs)

        for i2, j := 0, len(strs); i2 < j; i2, j = i+1, j-1 {
           aux = strs[i2]
           strs[i2] = strs[j]
           strs[j] = aux
       }
   }
}

回答1:


As you correctly identified, go strings are immutable, so you cannot assign to rune/character values at given indices.

Instead of reversing the string in-place one must create a copy of the runes in the string and reverse those instead, and then return the resulting string.

For example (Go Playground):

func reverse(s string) string {
  rs := []rune(s)
  for i, j := 0, len(rs)-1; i < j; i, j = i+1, j-1 {
    rs[i], rs[j] = rs[j], rs[i]
  }
  return string(rs)
}

func main() {
  fmt.Println(reverse("Hello, World!"))
  // !dlroW ,olleH
  fmt.Println(reverse("Hello, 世界!"))
  // !界世 ,olleH
}

There are problems with this approach due to the intricacies of Unicode (e.g. combining diacritical marks) but this will get you started.




回答2:


If you want to take into account unicode combining characters, Andrew Sellers hs an interesting take in this gist.

It starts by listing the Unicode block range for all combining diacritical marks (CDM)

  • regulars (inherited)
  • extended (containing diacritical marks used in German dialectology -- Teuthonista)
  • supplement (or the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology -- again, Teuthonista)
  • for symbols (arrows, dots, enclosures, and overlays for modifying symbol characters)
  • Half Marks (diacritic mark parts for spanning multiple characters)
var combining = &unicode.RangeTable{
    R16: []unicode.Range16{
        {0x0300, 0x036f, 1}, // combining diacritical marks
        {0x1ab0, 0x1aff, 1}, // combining diacritical marks extended
        {0x1dc0, 0x1dff, 1}, // combining diacritical marks supplement
        {0x20d0, 0x20ff, 1}, // combining diacritical marks for symbols
        {0xfe20, 0xfe2f, 1}, // combining half marks
    },
}

You can then read, rune after rune, your initial string:

sv := []rune(s)

But if you do so in reverse order, you will encounter combining diacritical marks (CDMs) first, and those need to preserve their order, to not be reversed

for ix := len(sv) - 1; ix >= 0; ix-- {
        r := sv[ix]
        if unicode.In(r, combining) {
            cv = append(cv, r)
            fmt.Printf("Detect combining diacritical mark ' %c'\n", r)
        }

(note the space around the %c combining rune: '%c' without space would means combining the mark with the first 'ͤ': instead of ' ͤ '. I tried to use the CGJ Combining Grapheme Joiner \u034F, but that does not work)

If you encounter finally a regular rune, you need to combine with those CDMs, before adding it to your reverse final rune array.

        } else {
            rrv := make([]rune, 0, len(cv)+1)
            rrv = append(rrv, r)
            rrv = append(rrv, cv...)
            fmt.Printf("regular mark '%c' (with '%d' combining diacritical marks '%s') => '%s'\n", r, len(cv), string(cv), string(rrv))
            rv = append(rv, rrv...)
            cv = make([]rune, 0)
        }

Where it gets even more complex is with emojis, and, for instance more recently, modifiers like the Medium-Dark Skin Tone, the type 5 on the Fitzpatrick Scale of skin tones.
If ignored, Reverse '👩🏾‍🦰👱🏾🧑🏾‍⚖️' will give '️⚖‍🏾🧑🏾👱🦰‍🏾👩', loosing the skin tone on the last two emojis.

👩🏾‍🦰 alone is (from unicode to code points converter):

  • 👩: women (1f469)
  • dark skin (1f3fe)
  • ZERO WIDTH JOINER (200d)
  • 🦰red hair (1f9b0)

Those should remain in the exact same order.

And don't get me started on the ZERO WIDTH JOINER (200D), which, from Wisdom/Awesome-Unicode, forces adjacent characters to be joined together (e.g., arabic characters or supported emoji). It Can be used this to compose sequentially combined emoji.

🧑🏾‍⚖️ is actual one glyph compose of two emojis. Which should not be inverted.
The program below correctly detect the "zero width joiner" and do not invert the emojis it combines.


Full example in Go playground.

Reverse 'Hello, World' => 'dlroW ,olleH'
Reverse '👽👶⃠🎃' => '🎃👶⃠👽'
Reverse '👩🏾‍🦰👱🏾🧑🏾‍⚖️' => '🧑🏾‍⚖️👱🏾👩🏾‍🦰'
Reverse 'aͤoͧiͤ  š́ž́ʟ́' => 'ʟ́ž́š́  iͤoͧaͤ'
Reverse 'H̙̖ell͔o̙̟͚͎̗̹̬ ̯W̖͝ǫ̬̞̜rḷ̦̣̪d̰̲̗͈' => 'd̰̲̗͈ḷ̦̣̪rǫ̬̞̜W̖͝ ̯o̙̟͚͎̗̹̬l͔leH̙̖'


来源:https://stackoverflow.com/questions/53244824/how-to-manipulate-strings-in-go-to-reverse-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!