问题
set.seed(8)
df <- data.frame(
A=sample(c(1:3), 10, replace=T),
B=sample(c(1:3), 10, replace=T),
C=sample(c(1:3), 10, replace=T),
D=sample(c(1:3), 10, replace=T),
E=sample(c(1:3), 10, replace=T),
F=sample(c(1:3), 10, replace=T))
Would like to pass a subset of columns into a dplyr mutate()
and make a row-wise calculation, for instance cor()
to get correlation between column A-C and D-F, but cannot figure out how. Found SO inspiration here, here and here, but nevertheless failed to produce an acceptable code. For instance, I tried this:
require(plyr)
require(dplyr)
df %>%
rowwise() %>%
mutate(c=cor(.[[1:3]],.[[4:6]]))
回答1:
You could try
df %>%
rowwise() %>%
do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))
回答2:
Here is another solution from FAY (2017).
> library(tidystringdist)
> comb <- tidy_comb_all(names(airquality))
> comb
# A tibble: 15 x 2
V1 V2
* <chr> <chr>
1 Ozone Solar.R
2 Ozone Wind
3 Ozone Temp
4 Ozone Month
5 Ozone Day
6 Solar.R Wind
7 Solar.R Temp
8 Solar.R Month
9 Solar.R Day
10 Wind Temp
11 Wind Month
12 Wind Day
13 Temp Month
14 Temp Day
15 Month Day
We get the combination of the pairs.
> bulk_cor <-
+ comb %>%
+ pmap(~ cor.test(airquality[[.x]], airquality[[.y]])) %>%
+ map_df(broom::tidy) %>%
+ bind_cols(comb, .)
> bulk_cor
# A tibble: 15 x 10
V1 V2 estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <fct> <fct>
1 Ozone Solar.R 0.348 3.88 1.79e- 4 109 0.173 0.502 Pearson's p~ two.sided
2 Ozone Wind -0.602 -8.04 9.27e-13 114 -0.706 -0.471 Pearson's p~ two.sided
3 Ozone Temp 0.698 10.4 2.93e-18 114 0.591 0.781 Pearson's p~ two.sided
4 Ozone Month 0.165 1.78 7.76e- 2 114 -0.0183 0.337 Pearson's p~ two.sided
5 Ozone Day -0.0132 -0.141 8.88e- 1 114 -0.195 0.169 Pearson's p~ two.sided
6 Solar.R Wind -0.0568 -0.683 4.96e- 1 144 -0.217 0.107 Pearson's p~ two.sided
7 Solar.R Temp 0.276 3.44 7.52e- 4 144 0.119 0.419 Pearson's p~ two.sided
8 Solar.R Month -0.0753 -0.906 3.66e- 1 144 -0.235 0.0882 Pearson's p~ two.sided
9 Solar.R Day -0.150 -1.82 7.02e- 2 144 -0.305 0.0125 Pearson's p~ two.sided
10 Wind Temp -0.458 -6.33 2.64e- 9 151 -0.575 -0.323 Pearson's p~ two.sided
11 Wind Month -0.178 -2.23 2.75e- 2 151 -0.328 -0.0202 Pearson's p~ two.sided
12 Wind Day 0.0272 0.334 7.39e- 1 151 -0.132 0.185 Pearson's p~ two.sided
13 Temp Month 0.421 5.70 6.03e- 8 151 0.281 0.543 Pearson's p~ two.sided
14 Temp Day -0.131 -1.62 1.08e- 1 151 -0.283 0.0287 Pearson's p~ two.sided
15 Month Day -0.00796 -0.0978 9.22e- 1 151 -0.166 0.151 Pearson's p~ two.sided
Now you can use dplyr::filter
to subset the results you want.
Biboligraphy
FAY, Colin. 2017. “A Crazy Little Thing Called purrr - Part 6 : Doing Statistics.” https://colinfay.me/purrr-statistics/.
来源:https://stackoverflow.com/questions/28807266/row-wise-cor-on-subset-of-columns-using-dplyrmutate