Clustering Time Series Data of Different Length

人走茶凉 提交于 2019-12-07 13:46:57

问题


I have time series data of different length of series. I want to cluster based upon DTW distance but could not find ant library regarding it. sklearn give straight error while tslearn kmeans gave wrong answer.

My problem is solving if I pad it with zeros but I am not sure if this is correct to pad time-series data while clustering.

The suggestion about other clustering technique about time series data are welcomed.

max_length = 0

for i in train_1:
    if(len(i)>max_length):
        max_length = len(i)
print(max_length)

train_1 = sequence.pad_sequences(train_1, maxlen=max_length)
km3 = TimeSeriesKMeans(n_clusters = 4, metric="dtw",verbose = False,random_state = 0).fit(train_1)

print(km3.labels_)

回答1:


You can try custom made k-means(clustering algorithm) or other. Source code is easily available at the sklearn library. Padding is really not a great option as it will change the question problem itself. You can also use tslearn and pyclustering(for optimal clusters) as an alternative, but remember to use DTW distance rather than Euclidean distance.



来源:https://stackoverflow.com/questions/56478455/clustering-time-series-data-of-different-length

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!