How can you gather() multiple columns at the same time in dplyr (R)?

怎甘沉沦 提交于 2020-01-06 06:48:55

问题


I am trying to gather untidy data from wide to long format. I have 748 variables, that need to be condensed to approximately 30.

In this post, I asked: how to tidy my wide data? The answer: use gather().

However, I am still struggling to gather multiple columns and was hoping you could pinpoint where I'm going wrong.

Reproducible example:

tb1 <- tribble(~x1,~x2,~x3,~y1,~y2,~y3,
       1,NA,NA,NA,1,NA,
       NA,1,NA,NA,NA,1,
       NA,NA,1,NA,NA,1)

# A tibble: 3 x 6
#     x1    x2    x3 y1       y2    y3
#  <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
#1     1    NA    NA NA        1    NA
#2    NA     1    NA NA       NA     1
#3    NA    NA     1 NA       NA     1

with x1-y3 having the following characteristics:

1 x1    Green 
2 x2    Yellow
3 x3    Orange
4 y1    Yes   
5 y2    No    
6 y3    Maybe 

I tried this:

tb1 %>%
  rename("Green" =x1,
         "Yellow"=x2,
         "Orange"=x3,
         "Yes"=y1,
         "No"=y2,
         "Maybe"=y3) %>%
  gather(X,val,-Green,-Yellow,-Orange) %>%
  gather(Y,val,-X) %>%
  select(-val)

I did get an output that I wanted for these variables, but I can't imagine how to do this for 700+ variables?! Is there a more effective way?

tb1 %>%
  rename("Green" =x1,
         "Yellow"=x2,
         "Orange"=x3,
         "Yes"=y1,
         "No"=y2,
         "Maybe"=y3) %>%
  gather(X,val,-Green,-Yellow,-Orange) %>%
  filter(!is.na(val)) %>%
  select(-val) %>%
  gather(Y,val,-X) %>%
  filter(!is.na(val)) %>%
  select(-val)

# A tibble: 3 x 2
  X     Y     
  <chr> <chr> 
1 No    Green 
2 Maybe Yellow
3 Maybe Orange

I think I might be just not acquainted enough with gather() so this is probably a stupid question - would appreciate the help. Thanks!


回答1:


I’m assuming the issue here is with manually specify all the different variable names. Luckily, tidyverse has the ?select_helpers which make it easier to select columns based on different rules.

Instead of renaming the variables at the beginning, we can rename them at the end. This lets us use starts_with to get all columns starting with x or y and gather them together in one step. Then we can use ends_with to select the value columns from those gather steps and filter and drop them.

Finally, we replace all values of x1, y1 etc. with their true values in one step using mutate_all and a lookup table

# Make lookup table to match X and Y variables with Values
  # the initial values should be the `names` (first) and the values to change them to
  # should be the `values` (after the =)
lookup <- c('x1' = 'Green',
            'x2' = 'Yellow',
            'x3' = 'Orange',
            'y1' = 'Yes',
            'y2' = 'No',
            'y3' = 'Maybe')

tb1 %>%
    gather(X, Xval, starts_with('x')) %>%    # Gather all variables that start with ‘x'
    gather(Y, Yval, starts_with('y')) %>%    # Gather all variables that start with ‘y'
    filter_at(vars(ends_with('val')),        # Looking in columns ending with ‘val'
              all_vars(!is.na(.))) %>% %>%    # Drop rows if ANY of these cols are NA
    select(-ends_with('val')) %>%            # Drop columns ending in ‘val'
    mutate_all(~lookup[.])                   # Replace value from lookup table in all cols

# A tibble: 3 x 2
  X      Y 
  <chr>  <chr>
1 Green  No   
2 Yellow Maybe
3 Orange Maybe

One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars. In gather and select, you can use them as is. In mutate, filter, summarize, etc. you need to surround them with vars



来源:https://stackoverflow.com/questions/56024422/how-can-you-gather-multiple-columns-at-the-same-time-in-dplyr-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!