问题
Problem
I have time series of speed of different vehicles. My ultimate objective is to cluster different vehicles based on their similarities in speed over time. So, I basically need to produce a distance matrix where each cell contains the distance between a pair of vehicle speed time series. I want to use Dynamic Time Warping (dtw) as distance metric. Therefore, I want to apply dtw on each pair of speed time series.
Data
Here are some sample data that contain only 8 observations per car and only 3 cars:
> dput(c)
structure(list(file.ID2 = c("Cars_03", "Cars_03", "Cars_03",
"Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_04",
"Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04",
"Cars_04", "Cars_05", "Cars_05", "Cars_05", "Cars_05", "Cars_05",
"Cars_05", "Cars_05", "Cars_05"), speed.kph.ED = c(129.3802848,
129.4022304, 129.424176, 129.4461216, 129.4680672, 129.47904,
129.5009856, 129.5229312, 127.8770112, 127.8221472, 127.7672832,
127.7124192, 127.6575552, 127.6026912, 127.5478272, 127.4929632,
134.1095616, 134.1205344, 134.1315072, 134.1534528, 134.1644256,
134.1753984, 134.1863712, 134.197344)), row.names = c(NA, -24L
), class = c("tbl_df", "tbl", "data.frame"), .Names = c("file.ID2",
"speed.kph.ED"))
What I tried
I can find the dtw::dtw()
distance for one pair like following:
library(dplyr)
library(dtw)
c3 <- c %>% filter(file.ID2=="Cars_03")
c4 <- c %>% filter(file.ID2=="Cars_04")
query <- c4$speed.kph.ED
reference <- c3$speed.kph.ED
dtw_results <- dtw(x = query, y = reference)
dtw_results$distance
But my question is : Is there a way to automatically find the dtw()$distance
between each pair and generate a distance matrix? In this example, it means these pairs:
Cars_03 - Cars_03
Cars_03 - Cars_04
Cars_03 - Cars_05
Cars_04 - Cars_03
Cars_04 - Cars_04
Cars_04 - Cars_05
and so on
I know for loop
is one way to do this. But since dtw
itself requires a lot of RAM, for loop
can further slow down the process. Any alternatives? I'm sorry if this is a silly question but I'm quite new to using dtw
.
回答1:
The following works
Split your data frame into a list by file.ID2
ds <- split(df, df$file.ID2)
Use expand.grid
to make all combinations of your names, file.ID2
and your values
Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)
purrr:map_dbl
iterates through all row-combinations of Values
and returns a vector of doubles
library(dtw)
library(purrr)
Dist <- map_dbl(1:nrow(Values), ~dtw(x = Values[.x,]$Var1[[1]]$speed.kph.ED, y = Values[.x,]$Var2[[1]]$speed.kph.ED)$distance)
Bind answer to Names
library(dplyr)
ans <- Names %>%
mutate(distance = Dist)
Output
Var1 Var2 distance
1 Cars_03 Cars_03 0.00000
2 Cars_04 Cars_03 25.66538
3 Cars_05 Cars_03 69.72117
4 Cars_03 Cars_04 25.66538
5 Cars_04 Cars_04 0.00000
6 Cars_05 Cars_04 96.00103
7 Cars_03 Cars_05 69.72117
8 Cars_04 Cars_05 96.00103
9 Cars_05 Cars_05 0.00000
回答2:
DTW only takes a lot of memory if implemented with recursion. If implemented with iterative version it only requires O(1) space overhead.
Using a warping window width constraint, you can build a matrix say 300 length 1,000 time series in a few minutes (at most). If you have even more data, try TADPOLE.
I suggest you read this tutorial
http://www.cs.unm.edu/~mueen/DTW.pdf
来源:https://stackoverflow.com/questions/45945769/how-to-apply-dtw-algorithm-on-multiple-time-series-in-r