问题
I have explored various options using quosures, symbols, and evaluation, but I can't seem to get the right syntax. Here is an example dataframe.
data.frame("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
A B C D pastecols
1 a z a b B, C
2 b y c d B, D
3 c x e f B, C, D
4 d w g h <NA>
Now suppose I want to paste values from different columns based on the lookup string in pastecols, and I always want to include column A. This is my desired result:
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
Ideally this could be done in dplyr. This is the closest I have gotten:
x %>% mutate(result = lapply(lapply(str_split(pastecols, ", "), c, "A"), na.omit))
A B C D pastecols result
1 a z a b B, C B, C, A
2 b y c d B, D B, D, A
3 c x e f B, C, D B, C, D, A
4 d w g h <NA> A
回答1:
Here's one way using pmap
to do a similar thing. pmap
can be used to effectively work on dataframes by row by capturing each row as a named vector; you can then get the desired column names for indexing as cols
by selecting them with ["pastecols"]
.
Most of the anonymous function syntax is not tidyverse
stuff, but just basic R stuff. To walk through it:
- Pass the dataframe as the list to the
.l
argument ofpmap_chr
. Remember that dataframes are lists of columns! - Capture all the
...
arguments withc(...)
. Basically we are calling each row of the dataframe as arguments to the function; nowrow
is a named vector containing the row. Note that if you have list-columns this will break, (but so will a lot of other things here so I assume there aren't any...) - We can get the values of
row
that we want fromrow["pastecols"]
, but we need to turn (say)"B, C"
intoc("A", "B", "C")
to do that. This next line just adds the"A"
, replaces missing values with"A"
, splits into pieces if there are any, and then indexes back down into the list. The[[
part is just how you dolist[[1]]"
in a pipe chain, it's the prefix form of the operator. You need this becausestr_split
returns a list and we just want the vector. - Use this
cols
vector to get the desired values fromrow
and return it, collapsed into a length 1 character vector!
library(tidyverse)
tbl <- tibble("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
tbl %>%
mutate(result = pmap_chr(
.l = .,
.f = function(...){
row <- c(...)
cols <- row["pastecols"] %>% str_c("A, ", .) %>% replace_na("A") %>% str_split(", ") %>% `[[`(1)
vals <- row[cols] %>% str_c(collapse = ", ")
return(vals)
}
))
#> # A tibble: 4 x 6
#> A B C D pastecols result
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a z a b B, C a, z, a
#> 2 b y c d B, D b, y, d
#> 3 c x e f B, C, D c, x, e, f
#> 4 d w g h <NA> d
Created on 2018-12-03 by the reprex package (v0.2.0).
回答2:
Not the most elegant solution but gets the job done with just base R. If column A
never shows up in pastecols
you can remove unique()
from the code.
for(r in seq_len(nrow(df))) {
df$result[r] <- paste(
df[r, na.omit(unique(c("A", unlist(strsplit(df$pastecols[r], ", ")))))],
collapse = " "
)
}
df
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
Data -
df <- data.frame(
"A" = letters[1:4],
"B" = letters[26:23],
"C" = letters[c(1,3,5,7)],
"D" = letters[c(2,4,6,8)],
"pastecols" = c("B, C","B, D", "B, C, D", NA), stringsAsFactors = F
)
回答3:
Here's a different way that doesn't rely on iterating functions in the apply
or map
families, if you would prefer to avoid them, and tries to leverage the tidyr
side of the tidyverse
. The approach is basically to expand the dataframe with gather
and separate_rows
into each combination of pastecols
and actual columns, and then filter
so we only keep the ones that match for each rowid
. Once we have that, we can group_by
and summarise
to bring it back to one row per rowid
. There is a bunch of housekeeping to deal with the fact that you always have column A
, and note that I leave A
in the output pastecols
, but you can remove that if you want to.
library(tidyverse)
tbl <- tibble("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
tbl %>%
rowid_to_column() %>%
mutate(
pastecols = str_c("A, ", pastecols),
pastecols = if_else(is.na(pastecols), "A", pastecols)
) %>%
gather(colname, value, -pastecols, -rowid) %>%
separate_rows(pastecols) %>%
filter(pastecols == colname) %>%
group_by(rowid) %>%
summarise(
pastecols = str_c(pastecols, collapse = ", "),
result = str_c(value, collapse = ", ")
)
#> # A tibble: 4 x 3
#> rowid pastecols result
#> <int> <chr> <chr>
#> 1 1 A, B, C a, z, a
#> 2 2 A, B, D b, y, d
#> 3 3 A, B, C, D c, x, e, f
#> 4 4 A d
Created on 2018-12-03 by the reprex package (v0.2.0).
来源:https://stackoverflow.com/questions/53599684/dplyr-mutate-specific-columns-by-evaluating-lookup-cell-value