Avoid memory allocation when indexing an array in Julia

后端 未结 2 1871
半阙折子戏
半阙折子戏 2021-01-01 21:08

Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From readi

2条回答
  •  死守一世寂寞
    2021-01-01 21:45

    EDIT: Read tholy's answer too to get a full picture!

    When using an array of indices, the situation is not great right now on Julia 0.4-pre (start of Feb 2015):

    julia> N = 10000000;
    julia> x = randn(N);
    julia> inds = [1:N];
    julia> @time mean(x)
    elapsed time: 0.010702729 seconds (96 bytes allocated)
    elapsed time: 0.012167155 seconds (96 bytes allocated)
    julia> @time mean(x[inds])
    elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
    elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
    elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
    julia> xs = sub(x,inds);  # Only works on 0.4
    julia> @time mean(xs)
    elapsed time: 0.057446177 seconds (96 bytes allocated)
    elapsed time: 0.096983673 seconds (96 bytes allocated)
    elapsed time: 0.096711312 seconds (96 bytes allocated)
    julia> using ArrayViews
    julia> xv = view(x, 1:N)  # Note use of a range, not [1:N]!
    julia> @time mean(xv)
    elapsed time: 0.012919509 seconds (96 bytes allocated)
    elapsed time: 0.013010655 seconds (96 bytes allocated)
    elapsed time: 0.01288134 seconds (96 bytes allocated)
    julia> xs = sub(x,1:N)  # Works on 0.3 and 0.4
    julia> @time mean(xs)
    elapsed time: 0.014191482 seconds (96 bytes allocated)
    elapsed time: 0.014023089 seconds (96 bytes allocated)
    elapsed time: 0.01257188 seconds (96 bytes allocated)
    
    • So while we can avoid the memory allocation, we are actually slower(!) still.
    • The issue is indexing by an array, as opposed to a range. You can't use sub for this on 0.3, but you can on 0.4.
    • If we can index by a range, then we can use ArrayViews.jl on 0.3 and its inbuilt on 0.4. This case is pretty much as good as the original mean.

    I noticed that with a smaller number of indices used (instead of the whole range), the gap is much smaller, and the memory allocation is low, so sub might be worth:

    N = 100000000
    x = randn(N)
    inds = [1:div(N,10)]
    
    @time mean(x)
    @time mean(x)
    @time mean(x)
    @time mean(x[inds])
    @time mean(x[inds])
    @time mean(x[inds])
    xi = sub(x,inds)
    @time mean(xi)
    @time mean(xi)
    @time mean(xi)
    

    gives

    elapsed time: 0.092831612 seconds (985 kB allocated)
    elapsed time: 0.067694917 seconds (96 bytes allocated)
    elapsed time: 0.066209038 seconds (96 bytes allocated)
    elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
    elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
    elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
    elapsed time: 0.186084807 seconds (4 MB allocated)
    elapsed time: 0.057476269 seconds (96 bytes allocated)
    elapsed time: 0.05733602 seconds (96 bytes allocated)
    

提交回复
热议问题