How to iterate individual characters in Lua string?

后端 未结 6 1410
陌清茗
陌清茗 2020-12-12 18:43

I have a string in Lua and want to iterate individual characters in it. But no code I\'ve tried works and the official manual only shows how to find and replace substrings :

相关标签:
6条回答
  • 2020-12-12 19:07

    Iterating to construct a string and returning this string as a table with load()...

    itab=function(char)
    local result
    for i=1,#char do
     if i==1 then
      result=string.format('%s','{')
     end
    result=result..string.format('\'%s\'',char:sub(i,i))
     if i~=#char then
      result=result..string.format('%s',',')
     end
     if i==#char then
      result=result..string.format('%s','}')
     end
    end
     return load('return '..result)()
    end
    
    dump=function(dump)
    for key,value in pairs(dump) do
     io.write(string.format("%s=%s=%s\n",key,type(value),value))
    end
    end
    
    res=itab('KOYAANISQATSI')
    
    dump(res)
    

    Puts out...

    1=string=K
    2=string=O
    3=string=Y
    4=string=A
    5=string=A
    6=string=N
    7=string=I
    8=string=S
    9=string=Q
    10=string=A
    11=string=T
    12=string=S
    13=string=I
    
    0 讨论(0)
  • 2020-12-12 19:12

    In lua 5.1, you can iterate of the characters of a string this in a couple of ways.

    The basic loop would be:

    for i = 1, #str do
        local c = str:sub(i,i)
        -- do something with c
    end
    

    But it may be more efficient to use a pattern with string.gmatch() to get an iterator over the characters:

    for c in str:gmatch"." do
        -- do something with c
    end
    

    Or even to use string.gsub() to call a function for each char:

    str:gsub(".", function(c)
        -- do something with c
    end)
    

    In all of the above, I've taken advantage of the fact that the string module is set as a metatable for all string values, so its functions can be called as members using the : notation. I've also used the (new to 5.1, IIRC) # to get the string length.

    The best answer for your application depends on a lot of factors, and benchmarks are your friend if performance is going to matter.

    You might want to evaluate why you need to iterate over the characters, and to look at one of the regular expression modules that have been bound to Lua, or for a modern approach look into Roberto's lpeg module which implements Parsing Expression Grammers for Lua.

    0 讨论(0)
  • 2020-12-12 19:23

    Depending on the task at hand it might be easier to use string.byte. It is also the fastest ways because it avoids creating new substring that happends to be pretty expensive in Lua thanks to hashing of each new string and checking if it is already known. You can pre-calculate code of symbols you look for with same string.byte to maintain readability and portability.

    local str = "ab/cd/ef"
    local target = string.byte("/")
    for idx = 1, #str do
       if str:byte(idx) == target then
          print("Target found at:", idx)
       end
    end
    
    0 讨论(0)
  • 2020-12-12 19:23

    There are already a lot of good approaches in the provided answers (here, here and here). If speed is what are you primarily looking for, you should definitely consider doing the job through Lua's C API, which is many times faster than raw Lua code. When working with preloaded chunks (eg. load function), the difference is not that big, but still considerable.

    As for the pure Lua solutions, let me share this small benchmark, I've made. It covers every provided answer to this date and adds a few optimizations. Still, the basic thing to consider is:

    How many times you'll need to iterate over characters in string?

    • If the answer is "once", than you should look up first part of the banchmark ("raw speed").
    • Otherwise, the second part will provide more precise estimation, because it parses the string into the table, which is much faster to iterate over. You should also consider writing a simple function for this, like @Jarriz suggested.

    Here is full code:

    -- Setup locals
    local str = "Hello World!"
    local attempts = 5000000
    local reuses = 10 -- For the second part of benchmark: Table values are reused 10 times. Change this according to your needs.
    local x, c, elapsed, tbl
    -- "Localize" funcs to minimize lookup overhead
    local stringbyte, stringchar, stringsub, stringgsub, stringgmatch = string.byte, string.char, string.sub, string.gsub, string.gmatch
    
    print("-----------------------")
    print("Raw speed:")
    print("-----------------------")
    
    -- Version 1 - string.sub in loop
    x = os.clock()
    for j = 1, attempts do
        for i = 1, #str do
            c = stringsub(str, i)
        end
    end
    elapsed = os.clock() - x
    print(string.format("V1: elapsed time: %.3f", elapsed))
    
    -- Version 2 - string.gmatch loop
    x = os.clock()
    for j = 1, attempts do
        for c in stringgmatch(str, ".") do end
    end
    elapsed = os.clock() - x
    print(string.format("V2: elapsed time: %.3f", elapsed))
    
    -- Version 3 - string.gsub callback
    x = os.clock()
    for j = 1, attempts do
        stringgsub(str, ".", function(c) end)
    end
    elapsed = os.clock() - x
    print(string.format("V3: elapsed time: %.3f", elapsed))
    
    -- For version 4
    local str2table = function(str)
        local ret = {}
        for i = 1, #str do
            ret[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
        end
        return ret
    end
    
    -- Version 4 - function str2table
    x = os.clock()
    for j = 1, attempts do
        tbl = str2table(str)
        for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
            c = tbl[i]
        end
    end
    elapsed = os.clock() - x
    print(string.format("V4: elapsed time: %.3f", elapsed))
    
    -- Version 5 - string.byte
    x = os.clock()
    for j = 1, attempts do
        tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
        for i = 1, #tbl do
            c = tbl[i] -- Note: produces char codes instead of chars.
        end
    end
    elapsed = os.clock() - x
    print(string.format("V5: elapsed time: %.3f", elapsed))
    
    -- Version 5b - string.byte + conversion back to chars
    x = os.clock()
    for j = 1, attempts do
        tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
        for i = 1, #tbl do
            c = stringchar(tbl[i])
        end
    end
    elapsed = os.clock() - x
    print(string.format("V5b: elapsed time: %.3f", elapsed))
    
    print("-----------------------")
    print("Creating cache table ("..reuses.." reuses):")
    print("-----------------------")
    
    -- Version 1 - string.sub in loop
    x = os.clock()
    for k = 1, attempts do
        tbl = {}
        for i = 1, #str do
            tbl[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
        end
        for j = 1, reuses do
            for i = 1, #tbl do
                c = tbl[i]
            end
        end
    end
    elapsed = os.clock() - x
    print(string.format("V1: elapsed time: %.3f", elapsed))
    
    -- Version 2 - string.gmatch loop
    x = os.clock()
    for k = 1, attempts do
        tbl = {}
        local tblc = 1 -- Note: This is faster than table.insert
        for c in stringgmatch(str, ".") do
            tbl[tblc] = c
            tblc = tblc + 1
        end
        for j = 1, reuses do
            for i = 1, #tbl do
                c = tbl[i]
            end
        end
    end
    elapsed = os.clock() - x
    print(string.format("V2: elapsed time: %.3f", elapsed))
    
    -- Version 3 - string.gsub callback
    x = os.clock()
    for k = 1, attempts do
        tbl = {}
        local tblc = 1 -- Note: This is faster than table.insert
        stringgsub(str, ".", function(c)
            tbl[tblc] = c
            tblc = tblc + 1
        end)
        for j = 1, reuses do
            for i = 1, #tbl do
                c = tbl[i]
            end
        end
    end
    elapsed = os.clock() - x
    print(string.format("V3: elapsed time: %.3f", elapsed))
    
    -- Version 4 - str2table func before loop
    x = os.clock()
    for k = 1, attempts do
        tbl = str2table(str)
        for j = 1, reuses do
            for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
                c = tbl[i]
            end
        end
    end
    elapsed = os.clock() - x
    print(string.format("V4: elapsed time: %.3f", elapsed))
    
    -- Version 5 - string.byte to create table
    x = os.clock()
    for k = 1, attempts do
        tbl = {stringbyte(str,1,#str)}
        for j = 1, reuses do
            for i = 1, #tbl do
                c = tbl[i]
            end
        end
    end
    elapsed = os.clock() - x
    print(string.format("V5: elapsed time: %.3f", elapsed))
    
    -- Version 5b - string.byte to create table + string.char loop to convert bytes to chars
    x = os.clock()
    for k = 1, attempts do
        tbl = {stringbyte(str, 1, #str)}
        for i = 1, #tbl do
            tbl[i] = stringchar(tbl[i])
        end
        for j = 1, reuses do
            for i = 1, #tbl do
                c = tbl[i]
            end
        end
    end
    elapsed = os.clock() - x
    print(string.format("V5b: elapsed time: %.3f", elapsed))
    

    Example output (Lua 5.3.4, Windows):

    -----------------------
    Raw speed:
    -----------------------
    V1: elapsed time: 3.713
    V2: elapsed time: 5.089
    V3: elapsed time: 5.222
    V4: elapsed time: 4.066
    V5: elapsed time: 2.627
    V5b: elapsed time: 3.627
    -----------------------
    Creating cache table (10 reuses):
    -----------------------
    V1: elapsed time: 20.381
    V2: elapsed time: 23.913
    V3: elapsed time: 25.221
    V4: elapsed time: 20.551
    V5: elapsed time: 13.473
    V5b: elapsed time: 18.046
    

    Result:

    In my case, the string.byte and string.sub were fastest in terms of raw speed. When using cache table and reusing it 10 times per loop, the string.byte version was fastest even when converting charcodes back to chars (which isn't always necessary and depends on usage).

    As you have probably noticed, I've made some assumptions based on my previous benchmarks and applied them to the code:

    1. Library functions should be always localized if used inside loops, because it is a lot faster.
    2. Inserting new element into lua table is much faster using tbl[idx] = value than table.insert(tbl, value).
    3. Looping through table using for i = 1, #tbl is a bit faster than for k, v in pairs(tbl).
    4. Always prefer the version with less function calls, because the call itself adds a little bit to the execution time.

    Hope it helps.

    0 讨论(0)
  • 2020-12-12 19:31

    All people suggest a less optimal method

    Will be best:

        function chars(str)
            strc = {}
            for i = 1, #str do
                table.insert(strc, string.sub(str, i, i))
            end
            return strc
        end
    
        str = "Hello world!"
        char = chars(str)
        print("Char 2: "..char[2]) -- prints the char 'e'
        print("-------------------\n")
        for i = 1, #str do -- testing printing all the chars
            if (char[i] == " ") then
                print("Char "..i..": [[space]]")
            else
                print("Char "..i..": "..char[i])
            end
        end
    
    0 讨论(0)
  • 2020-12-12 19:32

    If you're using Lua 5, try:

    for i = 1, string.len(str) do
        print( string.sub(str, i, i) )
    end
    
    0 讨论(0)
提交回复
热议问题