Extract the first letter of a UTF-8 string with Lua

前端 未结 2 709
广开言路
广开言路 2020-12-06 13:06

Is there any way to extract the first letter of a UTF-8 encoded string with Lua?

Lua does not properly support Unicode, so string.sub(\"ÆØÅ\", 2, 2) wil

2条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-06 13:14

    You can easily extract the first letter from a UTF-8 encoded string with the following code:

    function firstLetter(str)
      return str:match("[%z\1-\127\194-\244][\128-\191]*")
    end
    

    Because a UTF-8 code point either begins with a byte from 0 to 127, or with a byte from 194 to 244 followed by one or several bytes from 128 to 191.

    You can even iterate over UTF-8 code points in a similar manner:

    for code in str:gmatch("[%z\1-\127\194-\244][\128-\191]*") do
      print(code)
    end
    

    Note that both examples return a string value for each letter, and not the Unicode code point numerical value.

提交回复
热议问题