Efficient dot products of large memory-mapped arrays

后端未结

关注

 3  400

轻奢々 2020-12-02 22:41

I\'m working with some rather large, dense numpy float arrays that currently reside on disk in PyTables CArrays. I need to be able to perform efficient dot prod

3条回答

执念已碎 (楼主)

2020-12-02 23:25

I recomend you to use PyTables instead of numpy.memmap. Also read their presentations about compression, it sounds strange to me but seems that sequence "compress->transfer->uncompress" is faster then just transfer uncompressed.

Also use np.dot with MKL. And I don't know how numexpr(pytables also seems have something like it) can be used for matrix multiplication, but for example for calculating euclidean norm it's the fastest way(comparing with numpy).

Try to benchmark this sample code:

import numpy as np
import tables
import time
n_row=1000
n_col=1000
n_batch=100
def test_hdf5_disk():
    rows = n_row
    cols = n_col
    batches = n_batch
    #settings for all hdf5 files
    atom = tables.Float32Atom()
    filters = tables.Filters(complevel=9, complib='blosc') # tune parameters
    Nchunk = 4*1024  # ?
    chunkshape = (Nchunk, Nchunk)
    chunk_multiple = 1
    block_size = chunk_multiple * Nchunk

    fileName_A = 'carray_A.h5'
    shape_A = (n_row*n_batch, n_col)  # predefined size
    h5f_A = tables.open_file(fileName_A, 'w')
    A = h5f_A.create_carray(h5f_A.root, 'CArray', atom, shape_A, chunkshape=chunkshape, filters=filters)
    for i in range(batches):
        data = np.random.rand(n_row, n_col)
        A[i*n_row:(i+1)*n_row]= data[:]
    rows = n_col
    cols = n_row
    batches = n_batch
    fileName_B = 'carray_B.h5'
    shape_B = (rows, cols*batches)  # predefined size
    h5f_B = tables.open_file(fileName_B, 'w')
    B = h5f_B.create_carray(h5f_B.root, 'CArray', atom, shape_B, chunkshape=chunkshape, filters=filters)
    sz= rows/batches
    for i in range(batches):
        data = np.random.rand(sz, cols*batches)
        B[i*sz:(i+1)*sz]= data[:]
    fileName_C = 'CArray_C.h5'
    shape = (A.shape[0], B.shape[1])
    h5f_C = tables.open_file(fileName_C, 'w')
    C = h5f_C.create_carray(h5f_C.root, 'CArray', atom, shape, chunkshape=chunkshape, filters=filters)
    sz= block_size
    t0= time.time()
    for i in range(0, A.shape[0], sz):
        for j in range(0, B.shape[1], sz):
            for k in range(0, A.shape[1], sz):
                C[i:i+sz,j:j+sz] += np.dot(A[i:i+sz,k:k+sz],B[k:k+sz,j:j+sz])
    print (time.time()-t0)
    h5f_A.close()
    h5f_B.close()
    h5f_C.close()

The problem that I don't know how to tune chunk size and compression rate to current machine, so I think performance can be dependent on parameters.

Also please note that all matrices in sample code are stored on disk, if some of them will be stored in RAM I think it will be faster.

By the way I'm using x32 machine and with numpy.memmap I have some limitations on matrix size(I'm not sure but it seems that view size can be only ~2Gb) and PyTables have no limitations.

0 讨论(0)

查看其它3个回答