How to get distance matrix using dynamic time wraping?

问题

I have 6 timeseries values as follows.

import numpy as np
series = np.array([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])

Suppose, I want to get the distance matrix of dynamic time warping to perform a clustering. I used dtaidistance library for that as follows.

from dtaidistance import dtw
ds = dtw.distance_matrix_fast(series)

The output I got was as follows.

array([[       inf, 1.41421356, 2.23606798, 0.        , 1.41421356, 2.23606798],
       [       inf,        inf, 1.73205081, 1.41421356, 0.        , 1.73205081],
       [       inf,        inf,        inf, 2.23606798, 1.73205081, 0.        ],
       [       inf,        inf,        inf,        inf, 1.41421356, 2.23606798],
       [       inf,        inf,        inf,        inf,        inf, 1.73205081],
       [       inf,        inf,        inf,        inf,        inf,        inf]])

It seems to me that the output I get is wrong. For instance, as I understand the diagonal values of the ouput should be 0 (since they are ideal matches).

I want to know where I am making things wrong and how to fix it. I am also happy to get answers using other python libraries too.

I am happy to provide more details if needed

回答1:

Everything is correct. As per the docs:

The result is stored in a matrix representation. Since only the upper triangular matrix is required this representation uses more memory then necessary.

All diagonal elements are 0 the the lower triangular matrix is the the same as the upper triagular matrix mirrored at the diagonal. As all these value can be deducted from the upper triangular matrix they aren't shown in the output.
You can even use the compact=True argument to only get the values from the upper diagonal matrix concatenated into a 1D array.

You can convert the result to a full matrix like this:

ds[ds==np.inf] = 0
ds += dt.T

回答2:

In dtw.py the default value for elements of the distance matrix are specified to be np.inf. As the matrix returns the pairwise distance between different sequences, this will not be filled in in the matrix, resulting in np.inf values.

Try running with dtw.distance_matrix_fast(series, compact=True) to prevent seeing this filler information.

来源：https://stackoverflow.com/questions/62211066/how-to-get-distance-matrix-using-dynamic-time-wraping

标签

python

time-series

dtw