Tidy data.frame with repeated column names

99封情书 提交于 2019-11-28 01:51:23

You can use the make.unique-function to create unique column names. After that you can use melt from the data.table-package which is able to create multiple value-columns based on patterns in the columnnames:

# make the column names unique
names(toy) <- make.unique(names(toy))
# let the 'Condition' column start with a small letter 'c'
# so it won't be detected by the patterns argument from melt
names(toy)[2] <- tolower(names(toy)[2])

# load the 'data.table' package
library(data.table)
# tidy the data into long format
tidy_toy <- melt(setDT(toy), 
                 measure.vars = patterns('^A','^B','^C','^ID'), 
                 value.name = c('A','B','C','ID'))

which gives:

 > tidy_toy
                  file_path condition Trial.Num variable  A B  C    ID
 1:     root/some.extension  Baseline         1        1  2 3  5   car
 2:    root/thing.extension  Baseline         2        1  3 6 45   car
 3:     root/else.extension  Baseline         3        1  4 4  6   car
 4: root/uniquely.extension Treatment         1        1  5 3  7   car
 5:  root/defined.extension Treatment         2        1  6 7  3   car
 6:     root/some.extension  Baseline         1        2  2 1  7  bike
 7:    root/thing.extension  Baseline         2        2  5 4  4  bike
 8:     root/else.extension  Baseline         3        2  7 5  4  bike
 9: root/uniquely.extension Treatment         1        2  1 7 37  bike
10:  root/defined.extension Treatment         2        2  4 6  8  bike
11:     root/some.extension  Baseline         1        3  4 9  0 plane
12:    root/thing.extension  Baseline         2        3  9 5  4 plane
13:     root/else.extension  Baseline         3        3 68 7 56 plane
14: root/uniquely.extension Treatment         1        3  9 8  7 plane
15:  root/defined.extension Treatment         2        3  9 0  8 plane

Another option is to use a list of column-indexes for measure.vars:

tidy_toy <- melt(setDT(toy), 
                 measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)), 
                 value.name = c('A','B','C','ID'))

Making the column-names unique isn't necessary then.


A more complicated method that creates names that are better distinguishable by the patterns argument:

# select the names that are not unique
tt <- table(names(toy))
idx <- which(names(toy) %in% names(tt)[tt > 1])
nms <- names(toy)[idx]

# make them unique
names(toy)[idx] <- paste(nms, 
                         rep(seq(length(nms) / length(names(tt)[tt > 1])), 
                             each = length(names(tt)[tt > 1])), 
                         sep = '.')

# your columnnames are now unique:
> names(toy)
 [1] "file_path" "Condition" "Trial.Num" "A.1"       "B.1"       "C.1"       "ID.1"      "A.2"      
 [9] "B.2"       "C.2"       "ID.2"      "A.3"       "B.3"       "C.3"       "ID.3"     

# tidy the data into long format
tidy_toy <- melt(setDT(toy), 
                 measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'), 
                 value.name = c('A','B','C','ID'))

which will give the same end-result.


As mentioned in the comments, the janitor-package can be helpful for this problem as well. The clean_names() works similar as the make.unique function. See here for an explanation.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!