Identifying type of change in continuous values - R

一个人想着一个人 提交于 2021-01-28 11:21:40

问题


I have a dataset that tracked the numerical changes in different types of objects over time. So I have columns for ID, measurement, yearmonth, and change. The change column has TRUE values for a change that happened compared to its previous value. anything constant before and after that value is marked as FALSE unless another change happens.

I want to be able to do the following things:

  1. be able to set a threshold. So flag any value cases that switched past a specific number. For example, if the threshold is 5, then mark anything that went above that or below that, but not mark cases that only changed from 2 to 4 for example.

  2. make a column with the value change. For example, -2 for decreased the change from 5 to 3, and 2 for an increase from 5 to 7.

           # <chr> <int>     <int>   <lgl> 
         # 1   A     2        2019-2 FALSE 
         # 2   A     2        2019-3 FALSE 
         # 3   A     2        2019-4 FALSE 
         # 4   A     5        2019-5 TRUE  
         # 5   A     5        2019-5 FALSE 
         # 6   A     4        2019-8 TRUE 
         # 7   A     4        2019-9 TRUE 
         # 8   B     23       2019-5 FALSE 
         # 9   B     7        2019-9 TRUE  
         #10   B     7        2020-5 FALSE 
         # … with 11 more rows
    

回答1:


This dplyr solution allows you to get the change amount in a column, and will also create a logical column where you will get TRUE if the threshold value is crossed in either direction. For example, in row 4, the value has increased from 2 to 5, so the threshold has been crossed. However, in row 5, the value remains at 5 and so the threshold has not been crossed between rows 4 and 5. When we get to row 6, the value has dropped to 4, which is below the threshold, so we get TRUE in this row again.

I have set the threshold to 4.5 for clarity.

library(dplyr)

threshold <- 4.5

df %>%
  group_by(group) %>%
  mutate(change_amount  = c(0, diff(value)),
         crossed_thresh = sign(lag(value - threshold)) !=
                          sign(value - threshold),
         crossed_thresh = ifelse(is.na(crossed_thresh), FALSE,
                                 crossed_thresh))
#> # A tibble: 10 x 6
#> # Groups:   group [2]
#>    group value month  change change_amount crossed_thresh
#>    <chr> <int> <chr>  <lgl>          <dbl> <lgl>         
#>  1 A         2 2019-2 FALSE              0 FALSE         
#>  2 A         2 2019-3 FALSE              0 FALSE         
#>  3 A         2 2019-4 FALSE              0 FALSE         
#>  4 A         5 2019-5 TRUE               3 TRUE          
#>  5 A         5 2019-5 FALSE              0 FALSE         
#>  6 A         4 2019-8 TRUE              -1 TRUE          
#>  7 A         4 2019-9 TRUE               0 FALSE         
#>  8 B        23 2019-5 FALSE              0 FALSE         
#>  9 B         7 2019-9 TRUE             -16 FALSE         
#> 10 B         7 2020-5 FALSE              0 FALSE

Data

df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B"), value = c(2L, 2L, 2L, 5L, 5L, 4L, 4L, 23L, 7L, 7L), 
    month = c("2019-2", "2019-3", "2019-4", "2019-5", "2019-5", 
    "2019-8", "2019-9", "2019-5", "2019-9", "2020-5"), change = c(FALSE, 
    FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE
    )), class = "data.frame", row.names = c(NA, -10L))

df
#>    group value  month change
#> 1      A     2 2019-2  FALSE
#> 2      A     2 2019-3  FALSE
#> 3      A     2 2019-4  FALSE
#> 4      A     5 2019-5   TRUE
#> 5      A     5 2019-5  FALSE
#> 6      A     4 2019-8   TRUE
#> 7      A     4 2019-9   TRUE
#> 8      B    23 2019-5  FALSE
#> 9      B     7 2019-9   TRUE
#> 10     B     7 2020-5  FALSE




回答2:


data.table library can be helpful and it's shift function in particular.

#reproduction of dataset
df <- data.table(col1 = c('A','A','A','A','A','A','A','A','A','A'),
           col2 = c(2,2,2,5,5,4,4,23,7,7))

Add two columns

1st we check whether row is identical to previous one

df[,Identical :=(col2 == shift(col2)) ]

2nd we add difference of value to previous one

df[,change := col2 - shift(col2, 1)]

It gives desired output

    col1 col2 Identical change
 1:    A    2    NA     NA
 2:    A    2  TRUE      0
 3:    A    2  TRUE      0
 4:    A    5 FALSE      3
 5:    A    5  TRUE      0
 6:    A    4 FALSE     -1
 7:    A    4  TRUE      0
 8:    A   23 FALSE     19
 9:    A    7 FALSE    -16
10:    A    7  TRUE      0


来源:https://stackoverflow.com/questions/65344638/identifying-type-of-change-in-continuous-values-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!