What is a rune?

后端 未结 7 1482
予麋鹿
予麋鹿 2020-12-12 09:15

What is a rune in Go?

I\'ve been googling but Golang only says in one line: rune is an alias for int32.

But h

7条回答
  •  长情又很酷
    2020-12-12 10:03

    (Got a feeling that above answers still didn't state the differences & relationships between string and []rune very clearly, so I would try to add another answer with example.)

    As @Strangework's answer said, string and []rune are quiet different.

    Differences - string & []rune:

    • string value is a read-only byte slice. And, a string literal is encoded in utf-8. Each char in string actually takes 1 ~ 3 bytes, while each rune takes 4 bytes
    • For string, both len() and index are based on bytes.
    • For []rune, both len() and index are based on rune (or int32).

    Relationships - string & []rune:

    • When you convert from string to []rune, each utf-8 char in that string becomes a rune.
    • Similarly, in the reverse conversion, when convert from []rune to string, each rune becomes a utf-8 char in the string.

    Tips:

    • You can convert between string and []rune, but still they are different, in both type & overall size.

    (I would add an example to show that more clearly.)


    Code

    string_rune_compare.go:

    // string & rune compare,
    package main
    
    import "fmt"
    
    // string & rune compare,
    func stringAndRuneCompare() {
        // string,
        s := "hello你好"
    
        fmt.Printf("%s, type: %T, len: %d\n", s, s, len(s))
        fmt.Printf("s[%d]: %v, type: %T\n", 0, s[0], s[0])
        li := len(s) - 1 // last index,
        fmt.Printf("s[%d]: %v, type: %T\n\n", li, s[li], s[li])
    
        // []rune
        rs := []rune(s)
        fmt.Printf("%v, type: %T, len: %d\n", rs, rs, len(rs))
    }
    
    func main() {
        stringAndRuneCompare()
    }
    

    Execute:

    go run string_rune_compare.go

    Output:

    hello你好, type: string, len: 11
    s[0]: 104, type: uint8
    s[10]: 189, type: uint8
    
    [104 101 108 108 111 20320 22909], type: []int32, len: 7
    

    Explanation:

    • The string hello你好 has length 11, because first 5 chars each take 1 byte only, while the last 2 Chinese chars each takes 3 bytes.

      • Thus, total bytes = 5 * 1 + 2 * 3 = 11
      • Since len() on string is based on bytes, thus the first line printed len: 11
      • Since index on string is also based on bytes, thus the following 2 lines print values of type uint8 (since byte is an alias type of uint8, in go).
    • When convert the string to []rune, it found 7 utf8 chars, thus 7 runes.

      • Since len() on []rune is based on rune, thus the last line printed len: 7.
      • If you operate []rune via index, it will access base on rune.
        Since each rune is from a utf8 char in the original string, thus you can also say both len() and index operation on []rune are based on utf8 chars.

提交回复
热议问题