问题
I am trying to gather untidy data from wide to long format. I have 748 variables, that need to be condensed to approximately 30.
In this post, I asked: how to tidy my wide data? The answer: use gather().
However, I am still struggling to gather multiple columns and was hoping you could pinpoint where I'm going wrong.
Reproducible example:
tb1 <- tribble(~x1,~x2,~x3,~y1,~y2,~y3,
1,NA,NA,NA,1,NA,
NA,1,NA,NA,NA,1,
NA,NA,1,NA,NA,1)
# A tibble: 3 x 6
# x1 x2 x3 y1 y2 y3
# <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
#1 1 NA NA NA 1 NA
#2 NA 1 NA NA NA 1
#3 NA NA 1 NA NA 1
with x1-y3 having the following characteristics:
1 x1 Green
2 x2 Yellow
3 x3 Orange
4 y1 Yes
5 y2 No
6 y3 Maybe
I tried this:
tb1 %>%
rename("Green" =x1,
"Yellow"=x2,
"Orange"=x3,
"Yes"=y1,
"No"=y2,
"Maybe"=y3) %>%
gather(X,val,-Green,-Yellow,-Orange) %>%
gather(Y,val,-X) %>%
select(-val)
I did get an output that I wanted for these variables, but I can't imagine how to do this for 700+ variables?! Is there a more effective way?
tb1 %>%
rename("Green" =x1,
"Yellow"=x2,
"Orange"=x3,
"Yes"=y1,
"No"=y2,
"Maybe"=y3) %>%
gather(X,val,-Green,-Yellow,-Orange) %>%
filter(!is.na(val)) %>%
select(-val) %>%
gather(Y,val,-X) %>%
filter(!is.na(val)) %>%
select(-val)
# A tibble: 3 x 2
X Y
<chr> <chr>
1 No Green
2 Maybe Yellow
3 Maybe Orange
I think I might be just not acquainted enough with gather() so this is probably a stupid question - would appreciate the help. Thanks!
回答1:
I’m assuming the issue here is with manually specify all the different variable names. Luckily, tidyverse
has the ?select_helpers
which make it easier to select columns based on different rules.
Instead of renaming the variables at the beginning, we can rename them at the end. This lets us use starts_with
to get all columns starting with x
or y
and gather them together in one step. Then we can use ends_with
to select the value columns from those gather steps and filter and drop them.
Finally, we replace all values of x1
, y1
etc. with their true values in one step using mutate_all
and a lookup table
# Make lookup table to match X and Y variables with Values
# the initial values should be the `names` (first) and the values to change them to
# should be the `values` (after the =)
lookup <- c('x1' = 'Green',
'x2' = 'Yellow',
'x3' = 'Orange',
'y1' = 'Yes',
'y2' = 'No',
'y3' = 'Maybe')
tb1 %>%
gather(X, Xval, starts_with('x')) %>% # Gather all variables that start with ‘x'
gather(Y, Yval, starts_with('y')) %>% # Gather all variables that start with ‘y'
filter_at(vars(ends_with('val')), # Looking in columns ending with ‘val'
all_vars(!is.na(.))) %>% %>% # Drop rows if ANY of these cols are NA
select(-ends_with('val')) %>% # Drop columns ending in ‘val'
mutate_all(~lookup[.]) # Replace value from lookup table in all cols
# A tibble: 3 x 2
X Y
<chr> <chr>
1 Green No
2 Yellow Maybe
3 Orange Maybe
One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars
. In gather
and select
, you can use them as is. In mutate
, filter
, summarize
, etc. you need to surround them with vars
来源:https://stackoverflow.com/questions/56024422/how-can-you-gather-multiple-columns-at-the-same-time-in-dplyr-r