What can I do to increase the performance of a Lua program?

后端 未结 5 601
灰色年华
灰色年华 2020-12-07 09:03

I asked a question about Lua perfromance, and on of the responses asked:

Have you studied general tips for keeping Lua performance high? i.e. know tab

相关标签:
5条回答
  • 2020-12-07 09:13

    It must be also pointed that using array fields from tables is much faster than using tables with any kind of key. It happens (almost) all Lua implementations (including LuaJ) store a called "array part" inside tables, which is accessed by the table array fields, and doesn't store the field key, nor lookup for it ;).

    You can even also imitate static aspects of other languages like struct, C++/Java class, etc.. Locals and arrays are enough.

    0 讨论(0)
  • 2020-12-07 09:16
    • Making the most used functions locals
    • Making good use of tables as HashSets
    • Lowering table creation by reutilization
    • Using luajit!
    0 讨论(0)
  • 2020-12-07 09:19

    In response to some of the other answers and comments:

    It is true that as a programmer you should generally avoid premature optimization. But. This is not so true for scripting languages where the compiler does not optimize much -- or at all.

    So, whenever you write something in Lua, and that is executed very often, is run in a time-critical environment or could run for a while, it is a good thing to know things to avoid (and avoid them).

    This is a collection of what I found out over time. Some of it I found out over the net, but being of a suspicious nature when the interwebs are concerned I tested all of it myself. Also, I have read the Lua performance paper at Lua.org.

    Some reference:

    • Lua Performance Tips
    • Lua-users.org Optimisation Tips

    Avoid globals

    This is one of the most common hints, but stating it once more can't hurt.

    Globals are stored in a hashtable by their name. Accessing them means you have to access a table index. While Lua has a pretty good hashtable implementation, it's still a lot slower than accessing a local variable. If you have to use globals, assign their value to a local variable, this is faster at the 2nd variable access.

    do
      x = gFoo + gFoo;
    end
    do -- this actually performs better.
      local lFoo = gFoo;
      x = lFoo + lFoo;
    end
    

    (Not that simple testing may yield different results. eg. local x; for i=1, 1000 do x=i; end here the for loop header takes actually more time than the loop body, thus profiling results could be distorted.)

    Avoid string creation

    Lua hashes all strings on creation, this makes comparison and using them in tables very fast and reduces memory use since all strings are stored internally only once. But it makes string creation more expensive.

    A popular option to avoid excessive string creation is using tables. For example, if you have to assemble a long string, create a table, put the individual strings in there and then use table.concat to join it once

    -- do NOT do something like this
    local ret = "";
    for i=1, C do
      ret = ret..foo();
    end
    

    If foo() would return only the character A, this loop would create a series of strings like "", "A", "AA", "AAA", etc. Each string would be hashed and reside in memory until the application finishes -- see the problem here?

    -- this is a lot faster
    local ret = {};
    for i=1, C do
      ret[#ret+1] = foo();
    end
    ret = table.concat(ret);
    

    This method does not create strings at all during the loop, the string is created in the function foo and only references are copied into the table. Afterwards, concat creates a second string "AAAAAA..." (depending on how large C is). Note that you could use i instead of #ret+1 but often you don't have such a useful loop and you won't have an iterator variable you can use.

    Another trick I found somewhere on lua-users.org is to use gsub if you have to parse a string

    some_string:gsub(".", function(m)
      return "A";
    end);
    

    This looks odd at first, the benefit is that gsub creates a string "at once" in C which is only hashed after it is passed back to lua when gsub returns. This avoids table creation, but possibly has more function overhead (not if you call foo() anyway, but if foo() is actually an expression)

    Avoid function overhead

    Use language constructs instead of functions where possible

    function ipairs

    When iterating a table, the function overhead from ipairs does not justify it's use. To iterate a table, instead use

    for k=1, #tbl do local v = tbl[k];
    

    It does exactly the same without the function call overhead (pairs actually returns another function which is then called for every element in the table while #tbl is only evaluated once). It's a lot faster, even if you need the value. And if you don't...

    Note for Lua 5.2: In 5.2 you can actually define a __ipairs field in the metatable, which does make ipairs useful in some cases. However, Lua 5.2 also makes the __len field work for tables, so you might still prefer the above code to ipairs as then the __len metamethod is only called once, while for ipairs you would get an additional function call per iteration.

    functions table.insert, table.remove

    Simple uses of table.insert and table.remove can be replaced by using the # operator instead. Basically this is for simple push and pop operations. Here are some examples:

    table.insert(foo, bar);
    -- does the same as
    foo[#foo+1] = bar;
    
    local x = table.remove(foo);
    -- does the same as
    local x = foo[#foo];
    foo[#foo] = nil;
    

    For shifts (eg. table.remove(foo, 1)), and if ending up with a sparse table is not desirable, it is of course still better to use the table functions.

    Use tables for SQL-IN alike compares

    You might - or might not - have decisions in your code like the following

    if a == "C" or a == "D" or a == "E" or a == "F" then
       ...
    end
    

    Now this is a perfectly valid case, however (from my own testing) starting with 4 comparisons and excluding table generation, this is actually faster:

    local compares = { C = true, D = true, E = true, F = true };
    if compares[a] then
       ...
    end
    

    And since hash tables have constant look up time, the performance gain increases with every additional comparison. On the other hand if "most of the time" one or two comparisons match, you might be better off with the Boolean way or a combination.

    Avoid frequent table creation

    This is discussed thoroughly in Lua Performance Tips. Basically the problem is that Lua allocates your table on demand and doing it this way will actually take more time than cleaning it's content and filling it again.

    However, this is a bit of a problem, since Lua itself does not provide a method for removing all elements from a table, and pairs() is not the performance beast itself. I have not done any performance testing on this problem myself yet.

    If you can, define a C function that clears a table, this should be a good solution for table reuse.

    Avoid doing the same over and over

    This is the biggest problem, I think. While a compiler in a non-interpreted language can easily optimize away a lot of redundancies, Lua will not.

    Memoize

    Using tables this can be done quite easily in Lua. For single-argument functions you can even replace them with a table and __index metamethod. Even though this destroys transparancy, performance is better on cached values due to one less function call.

    Here is an implementation of memoization for a single argument using a metatable. (Important: This variant does not support a nil value argument, but is pretty damn fast for existing values.)

    function tmemoize(func)
        return setmetatable({}, {
            __index = function(self, k)
                local v = func(k);
                self[k] = v
                return v;
            end
        });
    end
    -- usage (does not support nil values!)
    local mf = tmemoize(myfunc);
    local v  = mf[x];
    

    You could actually modify this pattern for multiple input values

    Partial application

    The idea is similar to memoization, which is to "cache" results. But here instead of caching the results of the function, you would cache intermediate values by putting their calculation in a constructor function that defines the calculation function in it's block. In reality I would just call it clever use of closures.

    -- Normal function
    function foo(a, b, x)
        return cheaper_expression(expensive_expression(a,b), x);
    end
    -- foo(a,b,x1);
    -- foo(a,b,x2);
    -- ...
    
    -- Partial application
    function foo(a, b)
        local C = expensive_expression(a,b);
        return function(x)
            return cheaper_expression(C, x);
        end
    end
    -- local f = foo(a,b);
    -- f(x1);
    -- f(x2);
    -- ...
    

    This way it is possible to easily create flexible functions that cache some of their work without too much impact on program flow.

    An extreme variant of this would be Currying, but that is actually more a way to mimic functional programming than anything else.

    Here is a more extensive ("real world") example with some code omissions, otherwise it would easily take up the whole page here (namely get_color_values actually does a lot of value checking and recognizes accepts mixed values)

    function LinearColorBlender(col_from, col_to)
        local cfr, cfg, cfb, cfa = get_color_values(col_from);
        local ctr, ctg, ctb, cta = get_color_values(col_to);
        local cdr, cdg, cdb, cda = ctr-cfr, ctg-cfg, ctb-cfb, cta-cfa;
        if not cfr or not ctr then
            error("One of given arguments is not a color.");
        end
    
        return function(pos)
            if type(pos) ~= "number" then
                error("arg1 (pos) must be in range 0..1");
            end
            if pos < 0 then pos = 0; end;
            if pos > 1 then pos = 1; end;
            return cfr + cdr*pos, cfg + cdg*pos, cfb + cdb*pos, cfa + cda*pos;
        end
    end
    -- Call 
    local blender = LinearColorBlender({1,1,1,1},{0,0,0,1});
    object:SetColor(blender(0.1));
    object:SetColor(blender(0.3));
    object:SetColor(blender(0.7));
    

    You can see that once the blender was created, the function only has to sanity-check a single value instead of up to eight. I even extracted the difference calculation, though it probably does not improve a lot, I hope it shows what this pattern tries to achieve.

    0 讨论(0)
  • 2020-12-07 09:32

    Keep tables short, the larger the table the longer the search time. And in the same line iterating over numerically indexed tables (=arrays) is faster than key based tables (thus ipairs is faster than pairs)

    0 讨论(0)
  • 2020-12-07 09:34

    If your lua program is really too slow, use the Lua profiler and clean up expensive stuff or migrate to C. But if you're not sitting there waiting, your time is wasted.

    The first law of optimization: Don't.

    I'd love to see a problem where you have a choice between ipairs and pairs and can measure the effect of the difference.

    The one easy piece of low-hanging fruit is to remember to use local variables within each module. It's general not worth doing stuff like

    local strfind = string.find
    

    unless you can find a measurement telling you otherwise.

    0 讨论(0)
提交回复
热议问题