In R: tidyr split and swing value into column name using regex

懵懂的女人 提交于 2019-12-12 04:55:40

问题


Im trying to get customized with the tidyrpackage, and am strugling with the problem of having a variable which is a concatenate of several variables. In the minimal example below, I would like to split variable v2 into its constituent variables v3and v4and then swing these so I end up with the four variables v1-v4.

require(plyr)
require(dplyr)
require(stringr)
require(tidyr)    
data <- 
      data.frame(
        v1=c(1,2),
        v2=c("v3 cheese; v4 200", "v3 ham; v4 150")) %>%
      tbl_df()

If I split v2 into a new temp I get only v3:

mutate(data, 
      temp=unlist(sapply(str_split(data$v2, pattern=";"), "[", 1)))

  v1                v2      temp
1  1 v3 cheese; v4 200 v3 cheese
2  2    v3 ham; v4 150    v3 ham

My problems are:

  • 1) How do I split and swing v3 AND v4 up as column names using tidyr?
  • 2) In my real data I do not know (or they are to many) the variable names but they have the structure "var value", and I would like to use some regex to automatically identify and swing them as in 1)

Got inspired by this SO answer but could not get it to work though with regex code for variable names.

UPDATE: My output would be something like (v2 could be skipped as its now redundant with v3 and v4):

    v1  v2  v3  v4
1   1   v3 cheese; v4 200   cheese  200
2   2   v3 ham; v4 150  ham 150

回答1:


Split the data by ";", convert the split output to a long form, split the data again by " " (but in a wide form this time) and spread the values out to the wide form you desire.

Here it is using "dplyr" + "tidyr" + "stringi":

library(dplyr)
library(tidyr)
library(stringi)

data %>%
  mutate(v2 = stri_split_fixed(as.character(v2), ";")) %>%
  unnest(v2) %>%
  mutate(v2 = stri_trim_both(v2)) %>%
  separate(v2, into = c("var", "val")) %>%
  spread(var, val)
# Source: local data frame [2 x 3]
# 
#   v1     v3  v4
# 1  1 cheese 200
# 2  2    ham 150

Alternatively, using cSplit from my "splitstackshape" package (which doesn't presently work with tbl_dfs)

library(dplyr)
library(tidyr)
library(splitstackshape)

as.data.frame(data) %>%
  cSplit("v2", ";", "long") %>%
  cSplit("v2", " ") %>%
  spread(v2_1, v2_2) 
#    v1     v3  v4
# 1:  1 cheese 200
# 2:  2    ham 150


来源:https://stackoverflow.com/questions/29120787/in-r-tidyr-split-and-swing-value-into-column-name-using-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!