问题
Im trying to get customized with the tidyr
package, and am strugling with the problem of having a variable which is a concatenate of several variables. In the minimal example below, I would like to split variable v2
into its constituent variables v3
and v4
and then swing these so I end up with the four variables v1
-v4
.
require(plyr)
require(dplyr)
require(stringr)
require(tidyr)
data <-
data.frame(
v1=c(1,2),
v2=c("v3 cheese; v4 200", "v3 ham; v4 150")) %>%
tbl_df()
If I split v2
into a new temp
I get only v3
:
mutate(data,
temp=unlist(sapply(str_split(data$v2, pattern=";"), "[", 1)))
v1 v2 temp
1 1 v3 cheese; v4 200 v3 cheese
2 2 v3 ham; v4 150 v3 ham
My problems are:
- 1) How do I split and swing
v3
ANDv4
up as column names usingtidyr
? - 2) In my real data I do not know (or they are to many) the variable names but they have the structure "var value", and I would like to use some regex to automatically identify and swing them as in 1)
Got inspired by this SO answer but could not get it to work though with regex code for variable names.
UPDATE:
My output would be something like (v2
could be skipped as its now redundant with v3
and v4
):
v1 v2 v3 v4
1 1 v3 cheese; v4 200 cheese 200
2 2 v3 ham; v4 150 ham 150
回答1:
Split the data by ";", convert the split output to a long form, split the data again by " " (but in a wide form this time) and spread the values out to the wide form you desire.
Here it is using "dplyr" + "tidyr" + "stringi":
library(dplyr)
library(tidyr)
library(stringi)
data %>%
mutate(v2 = stri_split_fixed(as.character(v2), ";")) %>%
unnest(v2) %>%
mutate(v2 = stri_trim_both(v2)) %>%
separate(v2, into = c("var", "val")) %>%
spread(var, val)
# Source: local data frame [2 x 3]
#
# v1 v3 v4
# 1 1 cheese 200
# 2 2 ham 150
Alternatively, using cSplit
from my "splitstackshape" package (which doesn't presently work with tbl_df
s)
library(dplyr)
library(tidyr)
library(splitstackshape)
as.data.frame(data) %>%
cSplit("v2", ";", "long") %>%
cSplit("v2", " ") %>%
spread(v2_1, v2_2)
# v1 v3 v4
# 1: 1 cheese 200
# 2: 2 ham 150
来源:https://stackoverflow.com/questions/29120787/in-r-tidyr-split-and-swing-value-into-column-name-using-regex