R pivot_longer combining columns based on the end of column names

问题

I have a dataframe with multiple columns with names of various lengths and structures (so not sure how to capture them with a regex). Each column ends either with .t1 or .t3

I want to combine columns based on names without the t1/t3, with an additional column of Time based on that suffix.

So, for example, a dataframe such as:

df<-data.frame("Subject"= c(1:10),
"intercept.freq.acc.t1" = c(1:10),
"intercept.freq.acc.t3" = c(1:10),
"freq.rt.t1" = c(1:10), 
"freq.rt.t3" = c(1:10),
"vowel.con.acc.t1" = c(1:10),
"vowel.con.acc.t3" = c(1:10))

I want to turn it into

df<-data.frame("Subject"= rep(1:10,2),
"Time" = rep(c('t1','t3'), each = 10),
"intercept.freq.acc" = rep(1:10, 2),
"freq.rt" = rep(1:10,2), 
"vowel.con.acc" = rep(1:10, 2))

How do I go about doing this?

回答1:

You can use :

tidyr::pivot_longer(df, 
             cols = -Subject, 
             names_to = c('.value', 'Time'), 
             names_pattern = '(.*)\\.(t\\d+)')

#   Subject Time  intercept.freq.acc freq.rt vowel.con.acc
#     <int> <chr>              <int>   <int>         <int>
# 1       1 t1                     1       1             1
# 2       1 t3                     1       1             1
# 3       2 t1                     2       2             2
# 4       2 t3                     2       2             2
# 5       3 t1                     3       3             3
# 6       3 t3                     3       3             3
# 7       4 t1                     4       4             4
# 8       4 t3                     4       4             4
# 9       5 t1                     5       5             5
#10       5 t3                     5       5             5
#11       6 t1                     6       6             6
#12       6 t3                     6       6             6
#13       7 t1                     7       7             7
#14       7 t3                     7       7             7
#15       8 t1                     8       8             8
#16       8 t3                     8       8             8
#17       9 t1                     9       9             9
#18       9 t3                     9       9             9
#19      10 t1                    10      10            10
#20      10 t3                    10      10            10

回答2:

You could make use of the pivot_longer_spec function. This function takes a data frame template where you specify your input and output columns and then you feed this tempalte into the pivot_longer_spec function.

This usually is very helpful when you have no nice and easy split pattern for your columns. Personally, I find it easier to use such a template than to figuring our the regex for splitting up columns (in this case, the regex is still ok, though):

library(tidyverse)
template <- data.frame(.name  = colnames(df)[-1],
                       .value = c("intercept.freq.acc", "intercept.freq.acc", "freq.rt", "freq.rt", "vowel.con.acc", "vowel.con.acc"),
                       Time   = c("t1", "t3", "t1", "t3", "t1", "t3"))

The template looks as follows:

                  .name             .value Time
1 intercept.freq.acc.t1 intercept.freq.acc   t1
2 intercept.freq.acc.t3 intercept.freq.acc   t3
3            freq.rt.t1            freq.rt   t1
4            freq.rt.t3            freq.rt   t3
5      vowel.con.acc.t1      vowel.con.acc   t1
6      vowel.con.acc.t3      vowel.con.acc   t3

And then you can do an easy pivot_longer:

dat_long <- df %>%
  pivot_longer_spec(template)

which gives:

# A tibble: 20 x 5
   Subject Time  intercept.freq.acc freq.rt vowel.con.acc
     <int> <chr>              <int>   <int>         <int>
 1       1 t1                     1       1             1
 2       1 t3                     1       1             1
 3       2 t1                     2       2             2
 4       2 t3                     2       2             2
 5       3 t1                     3       3             3
 6       3 t3                     3       3             3
 7       4 t1                     4       4             4
 8       4 t3                     4       4             4
 9       5 t1                     5       5             5
10       5 t3                     5       5             5
11       6 t1                     6       6             6
12       6 t3                     6       6             6
13       7 t1                     7       7             7
14       7 t3                     7       7             7
15       8 t1                     8       8             8
16       8 t3                     8       8             8
17       9 t1                     9       9             9
18       9 t3                     9       9             9
19      10 t1                    10      10            10
20      10 t3                    10      10            10

回答3:

We can use melt

library(data.table)
 melt(setDT(df), id.var = 'Subject', measure = patterns('intercept', 'freq', 'vowel'), value.name = c('intercept.freq.acc', 'freq.rt', 'vowel.con.acc'))

来源：https://stackoverflow.com/questions/65158050/r-pivot-longer-combining-columns-based-on-the-end-of-column-names

标签

dataframe

tidyr