Efficient dot products of large memory-mapped arrays

后端 未结 3 400
轻奢々
轻奢々 2020-12-02 22:41

I\'m working with some rather large, dense numpy float arrays that currently reside on disk in PyTables CArrays. I need to be able to perform efficient dot prod

3条回答
  •  执念已碎
    2020-12-02 23:25

    I recomend you to use PyTables instead of numpy.memmap. Also read their presentations about compression, it sounds strange to me but seems that sequence "compress->transfer->uncompress" is faster then just transfer uncompressed.

    Also use np.dot with MKL. And I don't know how numexpr(pytables also seems have something like it) can be used for matrix multiplication, but for example for calculating euclidean norm it's the fastest way(comparing with numpy).

    Try to benchmark this sample code:

    import numpy as np
    import tables
    import time
    n_row=1000
    n_col=1000
    n_batch=100
    def test_hdf5_disk():
        rows = n_row
        cols = n_col
        batches = n_batch
        #settings for all hdf5 files
        atom = tables.Float32Atom()
        filters = tables.Filters(complevel=9, complib='blosc') # tune parameters
        Nchunk = 4*1024  # ?
        chunkshape = (Nchunk, Nchunk)
        chunk_multiple = 1
        block_size = chunk_multiple * Nchunk
    
        fileName_A = 'carray_A.h5'
        shape_A = (n_row*n_batch, n_col)  # predefined size
        h5f_A = tables.open_file(fileName_A, 'w')
        A = h5f_A.create_carray(h5f_A.root, 'CArray', atom, shape_A, chunkshape=chunkshape, filters=filters)
        for i in range(batches):
            data = np.random.rand(n_row, n_col)
            A[i*n_row:(i+1)*n_row]= data[:]
        rows = n_col
        cols = n_row
        batches = n_batch
        fileName_B = 'carray_B.h5'
        shape_B = (rows, cols*batches)  # predefined size
        h5f_B = tables.open_file(fileName_B, 'w')
        B = h5f_B.create_carray(h5f_B.root, 'CArray', atom, shape_B, chunkshape=chunkshape, filters=filters)
        sz= rows/batches
        for i in range(batches):
            data = np.random.rand(sz, cols*batches)
            B[i*sz:(i+1)*sz]= data[:]
        fileName_C = 'CArray_C.h5'
        shape = (A.shape[0], B.shape[1])
        h5f_C = tables.open_file(fileName_C, 'w')
        C = h5f_C.create_carray(h5f_C.root, 'CArray', atom, shape, chunkshape=chunkshape, filters=filters)
        sz= block_size
        t0= time.time()
        for i in range(0, A.shape[0], sz):
            for j in range(0, B.shape[1], sz):
                for k in range(0, A.shape[1], sz):
                    C[i:i+sz,j:j+sz] += np.dot(A[i:i+sz,k:k+sz],B[k:k+sz,j:j+sz])
        print (time.time()-t0)
        h5f_A.close()
        h5f_B.close()
        h5f_C.close()
    

    The problem that I don't know how to tune chunk size and compression rate to current machine, so I think performance can be dependent on parameters.

    Also please note that all matrices in sample code are stored on disk, if some of them will be stored in RAM I think it will be faster.

    By the way I'm using x32 machine and with numpy.memmap I have some limitations on matrix size(I'm not sure but it seems that view size can be only ~2Gb) and PyTables have no limitations.

提交回复
热议问题