I'm confused about some tidyr behavior. I can unnest a single response like this:
library(tidyr)
resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
tidy <- data %>%
transform(resp1 = strsplit(resp1, "; ")) %>%
unnest()
# Source: local data frame [6 x 3]
#
# resp2 resp3 resp1
# (chr) (chr) (chr)
# 1 C; D; F NA A
# 2 NA NA B
# 3 NA NA A
# 4 C; F G; H; I B
# 5 D H; I NA
# 6 E I B
But I need to unnest multiple columns in my dataset, and the columns have varying numbers of NAs. I tried this and it threw an error:
data %>%
transform(resp1 = strsplit(resp1, "; "),
resp2 = strsplit(resp2, "; "),
resp3 = strsplit(resp3, "; ")) %>%
unnest()
# Error: All nested columns must have the same number of elements.
I expected the code above would give me the same output as the following:
# unnesting multiple response (desired output / is there a better way?)
data %>%
transform(resp1 = strsplit(resp1, "; ")) %>%
unnest() %>%
transform(resp2 = strsplit(resp2, "; ")) %>%
unnest() %>%
transform(resp3 = strsplit(resp3, "; ")) %>%
unnest()
# resp1 resp2 resp3
# (chr) (chr) (chr)
# 1 A C NA
# 2 A D NA
# 3 A F NA
# 4 B NA NA
# 5 A NA NA
# 6 B C G
# 7 B C H
# 8 B C I
# 9 B F G
# 10 B F H
# 11 B F I
# 12 NA D H
# 13 NA D I
# 14 B E I
I'm new to R, but this feels clunky and makes me wonder if I'm abusing something I shouldn't be abusing. What's going on with failed multiple unnest attempt?
Check this link, which shows a different situation of unnesting multiple columns from yours. According to the documentation and the link given, unless there is some clever way to do this, the function might be just defined for a single column to avoid the ambiguity.
So you may have to unnest your columns one by one, and the code given below might be still cumbersome but simplifies a little bit.
> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
resp1 resp2 resp3
1 A C; D; F <NA>
2 B; A <NA> <NA>
3 B C; F G; H; I
4 <NA> D H; I
5 B E I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
resp2 = strsplit(resp2, "; "),
resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
resp1 resp2 resp3
1 A C <NA>
2 A D <NA>
3 A F <NA>
4 B <NA> <NA>
5 A <NA> <NA>
6 B C G
7 B C H
8 B C I
9 B F G
10 B F H
11 B F I
12 <NA> D H
13 <NA> D I
14 B E I
In addition to Psidom answer: by default, unnest
drops additional list columns (if row duplication is required).
Use .drop = FALSE
argument to keep other columns.
Line unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
becomes:
unnest(resp1, .drop = FALSE) %>% unnest(resp2, .drop = FALSE) %>% unnest(resp3)
来源:https://stackoverflow.com/questions/36816426/tidyr-multiple-unnesting-with-varying-na-counts