slower to mix logical variables with double?

问题

I have 0-1 valued vectors that I need to do some matrix operations on. They are not very sparse (only half of the values are 0) but saving them as a logical variable instead of double saves 8 times the memory: 1 byte for a logical, and 8 for double floating point.

Would it be any slower to do matrix multiplications of a logical vector and a double matrix than to use both as double? See my preliminary results below:

>> x = [0 1 0 1 0 1 0 1]; A = rand(numel(x)); xl = logical(x);
>> tic; for k = 1:10000; x * A * x'; end; toc %'
Elapsed time is 0.017682 seconds.
>> tic; for k = 1:10000; xl * A * xl'; end; toc %'
Elapsed time is 0.026810 seconds.
>> xs = sparse(x);
>> tic; for k = 1:10000; xs * A * xs'; end; toc %'
Elapsed time is 0.039566 seconds.

It seems that using logical representation is much slower (and sparse is even slower). Can someone explain why? Is it type casting time? Is it a limitation of the CPU/FPU instruction set?

EDIT: My system is MATLAB R2012b on Mac OS X 10.8.3 , Intel Core i7 3.4 GHz

EDIT2: A few comments show that on this is only a problem with Mac OS X. I would like to compile results from diverse architectures and OS if possible.

EDIT3: My actual problem requires computation with a huge portion of all possible binary vectors of length m, where m can be too large for 8 * m * 2^m to fit in memory.

回答1:

I'll start by posting a slightly better benchmark. I'm using the TIMEIT function from Steve Eddins to get more accurate timings:

function [t,err] = test_mat_mult()
    %# data
    N = 4000; sparsity = 0.7;    %# adjust size and sparsity of data
    x = double(rand(1,N) > sparsity);
    xl = logical(x);
    xs = sparse(x);
    A = randn(N);

    %# functions
    f = cell(3,1);
    f{1} = @() mult_func(x,A);
    f{2} = @() mult_func(xl,A);
    f{3} = @() mult_func(xs,A);

    %# timeit
    t = cellfun(@timeit, f);

    %# check results
    v = cellfun(@feval, f, 'UniformOutput',true);
    err = max(abs(v-mean(v)));  %# maximum error
end

function v = mult_func(x,A)
    v = x * A * x';
end

Here are the results on my machine (WinXP 32-bit, R2013a) with N=4000 and sparsity=0.7:

>> [t,err] = test_mat_mult
t =
     0.031212    %# double
     0.031970    %# logical
     0.071998    %# sparse
err =
   7.9581e-13

You can see double is only slightly better than logical on average, while sparse is slower than both as expected (since its focus is efficient memory usage not speed).

Now note that that MATLAB relies on BLAS implementations optimized for your platform to perform full-matrix multiplication (think DGEMM). In the general case, this includes routines for single/double types but not booleans, so conversion will occur which would explain why its slower for logical.

On Intel processors, BLAS/LAPACK routines are provided by the Intel MKL Library. Not sure about AMD, but I think it uses the equivalent ACML:

>> internal.matlab.language.versionPlugins.blas
ans =
Intel(R) Math Kernel Library Version 10.3.11 Product Build 20120606 for 32-bit applications

Of course the sparse case is a different story. (I know MATLAB uses SuiteSparse package for many of its sparse operations, but I'm not sure).

回答2:

I think the results are reasonably related to the different representations.

A non-sparse double array is simple and efficient for representing a small body of data that fits very easily in cache.

A logical array is more space-efficient, using only a byte per element instead of 8 bytes, but that does not gain anything when you only have 8 elements. On the other hand, it does have to be converted to double before doing double arithmetic using it, adding a step.

A sparse array uses a more complicated representation, designed to save space when most of the array is zero. It requires more operations to either decide that the element at a given index is zero, or to obtain its non-zero value. Using it for a 50% non-zero array that fits easily in even the smallest caches is a misuse. It is at its best for reducing the memory and data transfer cost of a large array that is almost all zero. See Sparse vs Normal Array Matlab

If you are really dealing with 8 element arrays, you should stick with non-sparse arrays of double. If your real work involves larger arrays, you need to benchmark with similar sizes. You also need to make sure the sparseness of your test data matches the real data.

回答3:

When you are working with data that fits entirely in cache and isn't too sparse (as in your benchmark), doing extra work (like converting between a logical type and double, or using sparse storage schemes) to try to reduce memory footprint will only slow your code down (as you have noticed).

Data accesses from L1 cache are fast enough as to be "effectively free" when there is a sufficient amount of computational work being done per data element loaded (as is the case in your example). When this happens, execution speed is limited by computation, not by load/store traffic; by using logical variables, you are doing more computation, which slows down your benchmark.

How big is the working set in the problem that you actually want to solve? If it's not at least bigger than the L2 cache on your processor, you should just use normal double matrices. The exact threshold at which using logical variables becomes advantageous is likely considerably larger, but would require some experiment to determine. (It will also depend on exactly how MATLAB handles the conversion; you want to do the conversion as part of the tiling for the multiplications--if MATLAB doesn't do that, it will likely never be faster than using double, no matter how big the data set is).

来源：https://stackoverflow.com/questions/16527212/slower-to-mix-logical-variables-with-double

标签

performance

matlab

types

floating-point

matrix-multiplication