Improve performance of converting numpy array to MATLAB double

后端 未结 3 960
说谎
说谎 2020-12-06 11:52

Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python. However, this isn\'t a realistic option for

相关标签:
3条回答
  • 2020-12-06 12:04

    Passing numpy arrays efficiently

    Take a look at the file mlarray_sequence.py in the folder PYTHONPATH\Lib\site-packages\matlab\_internal. There you will find the construction of the MATLAB array object. The performance problem comes from copying data with loops within the generic_flattening function.

    To avoid this behavior we will edit the file a bit. This fix should work on complex and non-complex datatypes.

    1. Make a backup of the original file in case something goes wrong.
    2. Add import numpy as np to the other imports at the beginning of the file
    3. In line 38 you should find:

      init_dims = _get_size(initializer)  # replace this with 
           try:
               init_dims=initializer.shape
           except:
               init_dims = _get_size(initializer)
      
    4. In line 48 you should find:

      if is_complex:
          complex_array = flat(self, initializer,
                               init_dims, typecode)
          self._real = complex_array['real']
          self._imag = complex_array['imag']
      else:
          self._data = flat(self, initializer, init_dims, typecode)
      
      #Replace this with:
      
      if is_complex:
          try:
              self._real = array.array(typecode,np.ravel(initializer, order='F').real)
              self._imag = array.array(typecode,np.ravel(initializer, order='F').imag)
          except:
              complex_array = flat(self, initializer,init_dims, typecode)
              self._real = complex_array['real']
              self._imag = complex_array['imag']
      else:
          try:
              self._data = array.array(typecode,np.ravel(initializer, order='F'))
          except:
              self._data = flat(self, initializer, init_dims, typecode)
      

    Now you can pass a numpy array directly to the MATLAB array creation method.

    data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
    #faster
    data1m = matlab.double(data1)
    #or slower method
    data1m = matlab.double(data1.tolist())
    
    data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
    #faster
    data1m = matlab.double(data2,is_complex=True)
    #or slower method
    data1m = matlab.double(data2.tolist(),is_complex=True)
    

    The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now.

    0 讨论(0)
  • 2020-12-06 12:14

    My situation was a bit different (python script called from matlab) but for me converting the ndarray into an array.array massively speed up the process. Basically it is very similar to Alexandre Chabot solution but without the need to alter any files:

    #untested i.e. only deducted from my "matlab calls python" situation
    import numpy
    import array
    
    data1 = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
    ar = array.array('d',data1.flatten('F').tolist())
    p = matlab.double(ar)
    C = matlab.reshape(p,data1.shape) #this part I am definitely not sure about if it will work like that
    

    At least if done from Matlab the combination of "array.array" and "double" is relative fast. Tested with Matlab 2016b + python 3.5.4 64bit.

    0 讨论(0)
  • 2020-12-06 12:20

    While awaiting better suggestions, I'll post the best trick I've come up with so far. It comes down to saving the file with `scipy.io.savemat´ and then loading this file in MATLAB.

    This is not the prettiest hack and it requires some care to ensure different processes relying on the same script don't end up writing and loading each other's .mat files, but the performance gain is worth it for me.

    As a test case I wrote two simple, almost identical MATLAB functions that require 2 numpy arrays (I tested with length 1000000) and one int as input.

    function d = test(x, y, fs_signal)
    d = sum((x + y))./double(fs_signal);
    
    function d = test2(path)
    load(path)
    d = sum((x + y))./double(fs_signal);
    

    The function test requires conversion, while test2 requires saving.

    Testing test: Converting the two numpy arrays takes cirka 40 s on my system. The total time to prepare for and run test comes down to 170 s

    Testing test2: Saving the arrays and int takes cirka 0.35 s on my system. Suprisingly, loading the .mat file in MATLAB is extremely efficient (or more suprisingly, it is extremely ineffcient at dealing with its doubles)... The total time to prepare for and run test2 comes down to 0.38 s

    That's a performance gain of almost 450x...

    0 讨论(0)
提交回复
热议问题