tidyr: multiple unnesting with varying NA counts

戏子无情 提交于 2019-11-29 07:59:54

Check this link, which shows a different situation of unnesting multiple columns from yours. According to the documentation and the link given, unless there is some clever way to do this, the function might be just defined for a single column to avoid the ambiguity.

So you may have to unnest your columns one by one, and the code given below might be still cumbersome but simplifies a little bit.

> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
  resp1   resp2   resp3
1     A C; D; F    <NA>
2  B; A    <NA>    <NA>
3     B    C; F G; H; I
4  <NA>       D    H; I
5     B       E       I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
          resp2 = strsplit(resp2, "; "),
          resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
   resp1 resp2 resp3
1      A     C  <NA>
2      A     D  <NA>
3      A     F  <NA>
4      B  <NA>  <NA>
5      A  <NA>  <NA>
6      B     C     G
7      B     C     H
8      B     C     I
9      B     F     G
10     B     F     H
11     B     F     I
12  <NA>     D     H
13  <NA>     D     I
14     B     E     I

In addition to Psidom answer: by default, unnest drops additional list columns (if row duplication is required).

Use .drop = FALSE argument to keep other columns.

Line unnest(resp1) %>% unnest(resp2) %>% unnest(resp3) becomes:

unnest(resp1, .drop = FALSE) %>% unnest(resp2, .drop = FALSE) %>% unnest(resp3)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!