Splitting single column into four columns and count repeated pattern in R

问题

Aim of this project is understand how information is acquired while looking into an object. Imagine an object has elements like a, b, c, d, e and f. A person might look at a and move onto to b and so forth. Now, we wish to plot and understand how that person have navigated across the different elements of a given stimuli. I have data that captured this movement in a single column but I need split this into few columns to get the navigation pattern. Please find the example given below.

I have column extracted from a data frame. Now it has to be split into four columns based on its characteristics.

a <- c( "a", "b", "b", "b", "a", "c", "a", "b", "d", "d", "d", "e", "f", "f", "e", "e", "f")
a <- as.data.frame(a)

Expected output

from   to   countfrom   countto

a      b      1           3
b      a      3           1
a      c      1           1
c      a      1           1
a      b      1           1
b      d      1           3
d      e      3           1      
e      f      1           2
f      e      2           2
e      f      2           1

Note: I used dplyr to extract from the dataframe.

回答1:

Use rle to get the relative runs of each letter, and then piece it together:

r <- rle(a$a)
## or maybe `r <- rle(as.character(a$a)` depending on your R version
setNames(
    data.frame(lapply(r, head, -1), lapply(r, tail, -1)),
    c("countfrom","from","countto","to")
)
##   countfrom from countto to
##1          1    a       3  b
##2          3    b       1  a
##3          1    a       1  c
##4          1    c       1  a
##5          1    a       1  b
##6          1    b       3  d
##7          3    d       1  e
##8          1    e       2  f
##9          2    f       2  e
##10         2    e       1  f

回答2:

Or in the tidyverse

library(tidyverse)
a <- c( "a", "b", "b", "b", "a", "c", "a", "b", "d", 
        "d", "d", "e", "f", "f", "e", "e", "f")
foo <- rle(a)

answ <- tibble(from = foo$values, to = lead(foo$values),
               fromCount = foo$lengths, toCount = lead(foo$lengths)) %>% 
  filter(!is.na(to))


# A tibble: 10 x 4
   from  to    fromCount toCount
   <chr> <chr>     <int>   <int>
 1 a     b             1       3
 2 b     a             3       1
 3 a     c             1       1
 4 c     a             1       1
 5 a     b             1       1
 6 b     d             1       3
 7 d     e             3       1
 8 e     f             1       2
 9 f     e             2       2
10 e     f             2       1

来源：https://stackoverflow.com/questions/61967678/splitting-single-column-into-four-columns-and-count-repeated-pattern-in-r

标签

string

dplyr