问题
I have a dataset that tracked the numerical changes in different types of objects over time. So I have columns for ID, measurement, yearmonth, and change. The change column has TRUE values for a change that happened compared to its previous value. anything constant before and after that value is marked as FALSE unless another change happens.
I want to be able to do the following things:
be able to set a threshold. So flag any value cases that switched past a specific number. For example, if the threshold is 5, then mark anything that went above that or below that, but not mark cases that only changed from 2 to 4 for example.
make a column with the value change. For example, -2 for decreased the change from 5 to 3, and 2 for an increase from 5 to 7.
# <chr> <int> <int> <lgl> # 1 A 2 2019-2 FALSE # 2 A 2 2019-3 FALSE # 3 A 2 2019-4 FALSE # 4 A 5 2019-5 TRUE # 5 A 5 2019-5 FALSE # 6 A 4 2019-8 TRUE # 7 A 4 2019-9 TRUE # 8 B 23 2019-5 FALSE # 9 B 7 2019-9 TRUE #10 B 7 2020-5 FALSE # … with 11 more rows
回答1:
This dplyr
solution allows you to get the change amount in a column, and will also create a logical column where you will get TRUE
if the threshold value is crossed in either direction. For example, in row 4, the value has increased from 2 to 5, so the threshold has been crossed. However, in row 5, the value remains at 5 and so the threshold has not been crossed between rows 4 and 5. When we get to row 6, the value has dropped to 4, which is below the threshold, so we get TRUE
in this row again.
I have set the threshold to 4.5 for clarity.
library(dplyr)
threshold <- 4.5
df %>%
group_by(group) %>%
mutate(change_amount = c(0, diff(value)),
crossed_thresh = sign(lag(value - threshold)) !=
sign(value - threshold),
crossed_thresh = ifelse(is.na(crossed_thresh), FALSE,
crossed_thresh))
#> # A tibble: 10 x 6
#> # Groups: group [2]
#> group value month change change_amount crossed_thresh
#> <chr> <int> <chr> <lgl> <dbl> <lgl>
#> 1 A 2 2019-2 FALSE 0 FALSE
#> 2 A 2 2019-3 FALSE 0 FALSE
#> 3 A 2 2019-4 FALSE 0 FALSE
#> 4 A 5 2019-5 TRUE 3 TRUE
#> 5 A 5 2019-5 FALSE 0 FALSE
#> 6 A 4 2019-8 TRUE -1 TRUE
#> 7 A 4 2019-9 TRUE 0 FALSE
#> 8 B 23 2019-5 FALSE 0 FALSE
#> 9 B 7 2019-9 TRUE -16 FALSE
#> 10 B 7 2020-5 FALSE 0 FALSE
Data
df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "B",
"B", "B"), value = c(2L, 2L, 2L, 5L, 5L, 4L, 4L, 23L, 7L, 7L),
month = c("2019-2", "2019-3", "2019-4", "2019-5", "2019-5",
"2019-8", "2019-9", "2019-5", "2019-9", "2020-5"), change = c(FALSE,
FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE
)), class = "data.frame", row.names = c(NA, -10L))
df
#> group value month change
#> 1 A 2 2019-2 FALSE
#> 2 A 2 2019-3 FALSE
#> 3 A 2 2019-4 FALSE
#> 4 A 5 2019-5 TRUE
#> 5 A 5 2019-5 FALSE
#> 6 A 4 2019-8 TRUE
#> 7 A 4 2019-9 TRUE
#> 8 B 23 2019-5 FALSE
#> 9 B 7 2019-9 TRUE
#> 10 B 7 2020-5 FALSE
回答2:
data.table library can be helpful and it's shift function in particular.
#reproduction of dataset
df <- data.table(col1 = c('A','A','A','A','A','A','A','A','A','A'),
col2 = c(2,2,2,5,5,4,4,23,7,7))
Add two columns
1st we check whether row is identical to previous one
df[,Identical :=(col2 == shift(col2)) ]
2nd we add difference of value to previous one
df[,change := col2 - shift(col2, 1)]
It gives desired output
col1 col2 Identical change
1: A 2 NA NA
2: A 2 TRUE 0
3: A 2 TRUE 0
4: A 5 FALSE 3
5: A 5 TRUE 0
6: A 4 FALSE -1
7: A 4 TRUE 0
8: A 23 FALSE 19
9: A 7 FALSE -16
10: A 7 TRUE 0
来源:https://stackoverflow.com/questions/65344638/identifying-type-of-change-in-continuous-values-r