问题
I have time series data of different length of series. I want to cluster based upon DTW distance but could not find ant library regarding it. sklearn
give straight error while tslearn kmeans gave wrong answer.
My problem is solving if I pad it with zeros but I am not sure if this is correct to pad time-series data while clustering.
The suggestion about other clustering technique about time series data are welcomed.
max_length = 0
for i in train_1:
if(len(i)>max_length):
max_length = len(i)
print(max_length)
train_1 = sequence.pad_sequences(train_1, maxlen=max_length)
km3 = TimeSeriesKMeans(n_clusters = 4, metric="dtw",verbose = False,random_state = 0).fit(train_1)
print(km3.labels_)
回答1:
You can try custom made k-means(clustering algorithm) or other. Source code is easily available at the sklearn library. Padding is really not a great option as it will change the question problem itself. You can also use tslearn and pyclustering(for optimal clusters) as an alternative, but remember to use DTW distance rather than Euclidean distance.
来源:https://stackoverflow.com/questions/56478455/clustering-time-series-data-of-different-length