Tidy data.frame with repeated column names

前端 未结 2 785
心在旅途
心在旅途 2020-12-07 01:08

I have a program that gives me data in this format

toy
                file_path Condition Trial.Num A B  C  ID A B  C   ID  A B  C    ID
1     root/some.ext         


        
2条回答
  •  时光取名叫无心
    2020-12-07 01:35

    You can use the make.unique-function to create unique column names. After that you can use melt from the data.table-package which is able to create multiple value-columns based on patterns in the columnnames:

    # make the column names unique
    names(toy) <- make.unique(names(toy))
    # let the 'Condition' column start with a small letter 'c'
    # so it won't be detected by the patterns argument from melt
    names(toy)[2] <- tolower(names(toy)[2])
    
    # load the 'data.table' package
    library(data.table)
    # tidy the data into long format
    tidy_toy <- melt(setDT(toy), 
                     measure.vars = patterns('^A','^B','^C','^ID'), 
                     value.name = c('A','B','C','ID'))
    

    which gives:

     > tidy_toy
                      file_path condition Trial.Num variable  A B  C    ID
     1:     root/some.extension  Baseline         1        1  2 3  5   car
     2:    root/thing.extension  Baseline         2        1  3 6 45   car
     3:     root/else.extension  Baseline         3        1  4 4  6   car
     4: root/uniquely.extension Treatment         1        1  5 3  7   car
     5:  root/defined.extension Treatment         2        1  6 7  3   car
     6:     root/some.extension  Baseline         1        2  2 1  7  bike
     7:    root/thing.extension  Baseline         2        2  5 4  4  bike
     8:     root/else.extension  Baseline         3        2  7 5  4  bike
     9: root/uniquely.extension Treatment         1        2  1 7 37  bike
    10:  root/defined.extension Treatment         2        2  4 6  8  bike
    11:     root/some.extension  Baseline         1        3  4 9  0 plane
    12:    root/thing.extension  Baseline         2        3  9 5  4 plane
    13:     root/else.extension  Baseline         3        3 68 7 56 plane
    14: root/uniquely.extension Treatment         1        3  9 8  7 plane
    15:  root/defined.extension Treatment         2        3  9 0  8 plane
    

    Another option is to use a list of column-indexes for measure.vars:

    tidy_toy <- melt(setDT(toy), 
                     measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)), 
                     value.name = c('A','B','C','ID'))
    

    Making the column-names unique isn't necessary then.


    A more complicated method that creates names that are better distinguishable by the patterns argument:

    # select the names that are not unique
    tt <- table(names(toy))
    idx <- which(names(toy) %in% names(tt)[tt > 1])
    nms <- names(toy)[idx]
    
    # make them unique
    names(toy)[idx] <- paste(nms, 
                             rep(seq(length(nms) / length(names(tt)[tt > 1])), 
                                 each = length(names(tt)[tt > 1])), 
                             sep = '.')
    
    # your columnnames are now unique:
    > names(toy)
     [1] "file_path" "Condition" "Trial.Num" "A.1"       "B.1"       "C.1"       "ID.1"      "A.2"      
     [9] "B.2"       "C.2"       "ID.2"      "A.3"       "B.3"       "C.3"       "ID.3"     
    
    # tidy the data into long format
    tidy_toy <- melt(setDT(toy), 
                     measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'), 
                     value.name = c('A','B','C','ID'))
    

    which will give the same end-result.


    As mentioned in the comments, the janitor-package can be helpful for this problem as well. The clean_names() works similar as the make.unique function. See here for an explanation.

提交回复
热议问题