问题
I have a dataset with the mean and sd of each variable as columns, but I want to convert it into "long" format as so:
library(tidyverse)
iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))
#> # A tibble: 3 x 9
#> Species Sepal.Length_me~ Sepal.Width_mean Petal.Length_me~
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46
#> 2 versic~ 5.94 2.77 4.26
#> 3 virgin~ 6.59 2.97 5.55
#> # ... with 5 more variables: Petal.Width_mean <dbl>,
#> # Sepal.Length_sd <dbl>, Sepal.Width_sd <dbl>, Petal.Length_sd <dbl>,
#> # Petal.Width_sd <dbl>
# Desired output:
#
# tribble(~Species, ~Variable, ~Mean, ~SD
# #-------------------------------
# ... )
I feel like tidyr::gather
would be good to use here, however, I am not sure how the syntax would work for having two values per key. Or perhaps I need to use two gathers and column bind them?
回答1:
To convert your post-summarise_all
data you can do the following
df %>%
gather(key, val, -Species) %>%
separate(key, into = c("Variable", "metric"), sep = "_") %>%
spread(metric, val)
## A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
But it's actually faster & shorter to transform the data from wide to long right from the start
iris %>%
gather(Variable, val, -Species) %>%
group_by(Species, Variable) %>%
summarise(Mean = mean(val), SD = sd(val))
## A tibble: 12 x 4
## Groups: Species [?]
# Species Variable Mean SD
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
回答2:
Here is an option with pivot_longer
from the dev version of tidyr
.
library(dplyr)
library(tidyr) #tidyr_0.8.3.9000
df %>%
rename_at(-1, ~ str_replace(., "(.*)_(.*)", "\\2_\\1")) %>%
pivot_longer(-Species, names_to = c(".value", "Variable"), names_sep = "_")
# A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Sepal.Length 5.01 0.352
# 2 setosa Sepal.Width 3.43 0.379
# 3 setosa Petal.Length 1.46 0.174
# 4 setosa Petal.Width 0.246 0.105
# 5 versicolor Sepal.Length 5.94 0.516
# 6 versicolor Sepal.Width 2.77 0.314
# 7 versicolor Petal.Length 4.26 0.470
# 8 versicolor Petal.Width 1.33 0.198
# 9 virginica Sepal.Length 6.59 0.636
#10 virginica Sepal.Width 2.97 0.322
#11 virginica Petal.Length 5.55 0.552
#12 virginica Petal.Width 2.03 0.275
data
data(iris)
df <- iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))
来源:https://stackoverflow.com/questions/54816348/tidyr-gathering-two-values-per-key