Detect a pattern in a column with R

白昼怎懂夜的黑 提交于 2019-12-11 17:39:34

问题


I am trying to calculate how many times a person moved from one job to another. This can be calculated every time the Job column has this pattern 1 -> 0 -> 1.

In this example, it happened one rotation:

Person Job
  A     1
  A     0
  A     1
  A     1

In this another example, person B had one rotation as well.

Person Job
  A     1
  A     0
  A     1
  A     1
  B     1
  B     0
  B     0 
  B     1

Whats would be a good approach to measure this pattern in a new column 'rotation', by person ?

    Person Job  Rotation
      A     1      0
      A     0      0
      A     1      1
      A     1      1
      B     1      0
      B     0      0
      B     0      0
      B     1      1

回答1:


You can use regular expressions to capture a group with 101 and count it as a 1. so you use a pattern="(?<=1)0+(?=1)" where for all zeros, check whether they are preceeded by 1 and also succeeded by a 1

library(tidyverse)
df%>%
   group_by(Person)%>%
   mutate(Rotation=str_count(accumulate(Job,str_c,collapse=""),"(?<=1)0+(?=1)"))
# A tibble: 12 x 3
# Groups:   Person [3]
   Person   Job Rotation
   <fct>  <int>    <int>
 1 A          1        0
 2 A          0        0
 3 A          1        1
 4 A          1        1
 5 B          1        0
 6 B          0        0
 7 B          0        0
 8 B          1        1
 9 C          0        0
10 C          1        0
11 C          0        0
12 C          1        1



回答2:


One solution is to use lag with default = 0 and count cumulative sum of condition when value changes from 0 to 1. Just subtract 1 from the cumsum to get the rotation.

The solution using dplyr can be as:

library(dplyr)

df %>% group_by(Person) %>%
  mutate(Rotation = cumsum(lag(Job, default = 0) == 0 & Job ==1) - 1) %>%
  as.data.frame()

#   Person Job Rotation
# 1      A   1        0
# 2      A   0        0
# 3      A   1        1
# 4      A   1        1
# 5      B   1        0
# 6      B   0        0
# 7      B   0        0
# 8      B   1        1

Data:

df <- read.table(text ="
Person Job
A     1
A     0
A     1
A     1
B     1
B     0
B     0 
B     1",
header = TRUE, stringsAsFactors = FALSE)



回答3:


Here is an option with data.table

library(data.table)
setDT(df)[, Rotation := +(grepl("101", do.call(paste0,
                       shift(Job, 0:.N, fill = 0)))), Person]
df
#    Person Job Rotation
# 1:      A   1       0
# 2:      A   0       0
# 3:      A   1       1
# 4:      A   1       1
# 5:      B   1       0
# 6:      B   0       0
# 7:      B   0       0
# 8:      B   1       0
# 9:      C   0       0
#10:      C   1       0
#11:      C   0       0
#12:      C   1       1

A base R option would be

f1 <- function(x) Reduce(paste0, x, accumulate = TRUE)
df$Rotation <- with(df, +grepl("101", ave(Job, Person, FUN = f1)))

data

df <- data.frame(Person = rep(c("A", "B", "C"), each = 4L),
                 Job = as.integer(c(1,0,1,1,
                                    1,0,0,1,
                                    0,1,0,1)))



回答4:


I'm assuming that if a person starts unemployed, the first job they get doesn't count as rotation. In that case:

library(dplyr)

rotation <- function(x) {
    # this will have 1 when a person got a new job
    dif <- c(0L, diff(x))
    dif[dif < 0L] <- 0L
    if (x[1L] == 0L) {
        # unemployed at the beginning,
        # first job doesn't count as change from one to another
        dif[which.max(dif)] <- 0L
    }
    # return
    cumsum(dif)
}

df <- data.frame(Person = rep(c("A", "B", "C"), each = 4L),
                 Job = as.integer(c(1,0,1,1,
                                    1,0,0,1,
                                    0,1,0,1)))

df %>%
    group_by(Person) %>%
    mutate(Rotation = rotation(Job))
# A tibble: 12 x 3
# Groups:   Person [3]
   Person   Job Rotation
   <fct>  <int>    <int>
 1 A          1        0
 2 A          0        0
 3 A          1        1
 4 A          1        1
 5 B          1        0
 6 B          0        0
 7 B          0        0
 8 B          1        1
 9 C          0        0
10 C          1        0
11 C          0        0
12 C          1        1


来源:https://stackoverflow.com/questions/51178870/detect-a-pattern-in-a-column-with-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!