tidyr: multiple unnesting with varying NA counts

不羁的心 提交于 2019-11-30 08:46:48

问题


I'm confused about some tidyr behavior. I can unnest a single response like this:

library(tidyr)

resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)

tidy <- data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest()

# Source: local data frame [6 x 3]
#
#      resp2   resp3 resp1
#      (chr)   (chr) (chr)
# 1 C; D; F      NA     A
# 2      NA      NA     B
# 3      NA      NA     A
# 4    C; F G; H; I     B
# 5       D    H; I    NA
# 6       E       I     B

But I need to unnest multiple columns in my dataset, and the columns have varying numbers of NAs. I tried this and it threw an error:

data %>%
  transform(resp1 = strsplit(resp1, "; "),
            resp2 = strsplit(resp2, "; "),
            resp3 = strsplit(resp3, "; ")) %>%
  unnest()
# Error: All nested columns must have the same number of elements.

I expected the code above would give me the same output as the following:

# unnesting multiple response (desired output / is there a better way?)
data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest() %>%
  transform(resp2 = strsplit(resp2, "; ")) %>%
  unnest() %>%
  transform(resp3 = strsplit(resp3, "; ")) %>%
  unnest()

#     resp1 resp2 resp3
#     (chr) (chr) (chr)
# 1      A     C    NA
# 2      A     D    NA
# 3      A     F    NA
# 4      B    NA    NA
# 5      A    NA    NA
# 6      B     C     G
# 7      B     C     H
# 8      B     C     I
# 9      B     F     G
# 10     B     F     H
# 11     B     F     I
# 12    NA     D     H
# 13    NA     D     I
# 14     B     E     I

I'm new to R, but this feels clunky and makes me wonder if I'm abusing something I shouldn't be abusing. What's going on with failed multiple unnest attempt?


回答1:


Check this link, which shows a different situation of unnesting multiple columns from yours. According to the documentation and the link given, unless there is some clever way to do this, the function might be just defined for a single column to avoid the ambiguity.

So you may have to unnest your columns one by one, and the code given below might be still cumbersome but simplifies a little bit.

> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
  resp1   resp2   resp3
1     A C; D; F    <NA>
2  B; A    <NA>    <NA>
3     B    C; F G; H; I
4  <NA>       D    H; I
5     B       E       I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
          resp2 = strsplit(resp2, "; "),
          resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
   resp1 resp2 resp3
1      A     C  <NA>
2      A     D  <NA>
3      A     F  <NA>
4      B  <NA>  <NA>
5      A  <NA>  <NA>
6      B     C     G
7      B     C     H
8      B     C     I
9      B     F     G
10     B     F     H
11     B     F     I
12  <NA>     D     H
13  <NA>     D     I
14     B     E     I



回答2:


In addition to Psidom answer: by default, unnest drops additional list columns (if row duplication is required).

Use .drop = FALSE argument to keep other columns.

Line unnest(resp1) %>% unnest(resp2) %>% unnest(resp3) becomes:

unnest(resp1, .drop = FALSE) %>% unnest(resp2, .drop = FALSE) %>% unnest(resp3)


来源:https://stackoverflow.com/questions/36816426/tidyr-multiple-unnesting-with-varying-na-counts

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!