Efficient pairwise DTW calculation using numpy or cython

断了今生、忘了曾经 提交于 2021-02-18 10:10:54

问题


I am trying to calculate the pairwise distances between multiple time-series contained in a numpy array. Please see the code below

print(type(sales))
print(sales.shape)

<class 'numpy.ndarray'>
(687, 157)

So, sales contains 687 time series of length 157. Using pdist to calculate the DTW distances between the time series.

import fastdtw
import scipy.spatial.distance as sd

def my_fastdtw(sales1, sales2):
    return fastdtw.fastdtw(sales1,sales2)[0]

distance_matrix = sd.pdist(sales, my_fastdtw)

---EDIT: tried doing it without pdist()-----

distance_matrix = []
m = len(sales)    
for i in range(0, m - 1):
    for j in range(i + 1, m):
        distance_matrix.append(fastdtw.fastdtw(sales[i], sales[j]))

---EDIT: parallelizing the inner for loop-----

from joblib import Parallel, delayed
import multiprocessing
import fastdtw

num_cores = multiprocessing.cpu_count() - 1
N = 687

def my_fastdtw(sales1, sales2):
    return fastdtw.fastdtw(sales1,sales2)[0]

results = [[] for i in range(N)]
for i in range(0, N- 1):
    results[i] = Parallel(n_jobs=num_cores)(delayed(my_fastdtw) (sales[i],sales[j])  for j in range(i + 1, N) )

All the methods are very slow. The parallel method takes around 12 minutes. Can someone please suggest an efficient way?

---EDIT: Following the steps mentioned in the answer below---

Here is how the lib folder looks like:

VirtualBox:~/anaconda3/lib/python3.6/site-packages/fastdtw-0.3.2-py3.6- linux-x86_64.egg/fastdtw$ ls
_fastdtw.cpython-36m-x86_64-linux-gnu.so  fastdtw.py   __pycache__
_fastdtw.py                               __init__.py

So, there is a cython version of fastdtw in there. While installation, I did not receive any errors. Even now, when I pressed CTRL-C during my program execution, I can see that the pure python version is being used (fastdtw.py):

/home/vishal/anaconda3/lib/python3.6/site-packages/fastdtw/fastdtw.py in fastdtw(x, y, radius, dist)

/home/vishal/anaconda3/lib/python3.6/site-packages/fastdtw/fastdtw.py in __fastdtw(x, y, radius, dist)

The code remains slow like before.


回答1:


TL;DR

Your fastdtw falled to install the fast cpp-version and falls back silently to a pure-python version, which is slow.

You need to fix the installation of the fastdtw-package.


The whole calculation is done in fastdtw, so you cannot really speed it up from the outside. And parallelization and python is not such an easy thing (yet?).

The fastdtw documentation says it needs about O(n) operations for a comparison, so for your whole test-set it will need about order of magnitude of 10^9 operations, which should be finished in about some seconds, if programmed in, for example, C. The performance you see is nowhere near it.

If we look at the code of fastdtw we see, that there are two versions: the cython/cpp-version which is fast and imported via cython and a slow fall back pure-python-version. If the fast version isn't preset, the slow python version is silently used.

So run your calculation, interrupt it with Ctr+C and you will see, that you are somewhere in python-code. You can also go to your lib-folder and see, that there is only the pure-python version inside.

So your installation of the fast fastdtw version failed. Actually, I think the wheel-package is botched, at least for my version there is only the pure python code present.

What to do?

  1. Get the source code, e.g. via git clone https://github.com/slaypni/fastdtw
  2. go into fstdtw folder and run python setup.py build
  3. watch out for errors. Mine was

fatal error: numpy/npy_math.h: No such file or directory

  1. fix it.

For me, the fix was to change the following lines in setup.py:

import numpy # THIS ADDED
extensions = [Extension(
        'fastdtw._fastdtw',
        [os.path.join('fastdtw', '_fastdtw' + ext)],
        language="c++",
        include_dirs=[numpy.get_include()], # AND ADDED numpy.get_include()
        libraries=["stdc++"]
    )]
  1. repeat 3.+4. until successful
  2. run python setup.py install

Now your program should be about 100 times faster. `




回答2:


To be honest, fastdtw is not fast at all

from cdtw import pydtw
from dtaidistance import dtw
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean
s1=np.array([1,2,3,4],dtype=np.double)
s2=np.array([4,3,2,1],dtype=np.double)

%timeit dtw.distance_fast(s1, s2)
4.1 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit d2 = pydtw.dtw(s1,s2,pydtw.Settings(step = 'p0sym', window = 'palival', param = 2.0, norm = False, compute_path = True)).get_dist()
45.6 µs ± 3.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit d3,_=fastdtw(s1, s2, dist=euclidean)
901 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

fastdtw is 219 times slower than dtaidistance lib and 20x slower than cdtw

Consider changing. Here is dtaidistance git:

https://github.com/wannesm/dtaidistance

To install, just:

pip install dtaidistance


来源:https://stackoverflow.com/questions/44994866/efficient-pairwise-dtw-calculation-using-numpy-or-cython

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!