问题
I have a data that looks like the following dataframe, but every combo has about ten fields, starting with name1, adress1, city1, etc
id name1 adress1 name2 adress2 name3 adress3
1 1 John street a Burt street d chris street 1
2 2 Jack street b Ben street e connor street 2
3 3 Joey <NA> Bob street f <NA> <NA>
Now I would like to rearrange this data so it is a bit more useful and it should look like so, but with the information from which entry it came from:
id origin names adresses
1 1 1 John street a
2 2 1 Jack street b
3 3 1 Joey <NA>
4 1 2 Burt street d
5 2 2 Ben street e
6 3 2 Bob street f
7 1 3 chris street 1
8 2 3 connor street 2
Using tidyr I can get a long format, but then I have a key column that contains all the variable names, name1, name2, name3, street1, etc.
I also tried using separate dataframes, one for each combination, e.g. one dataframe for the names, one for the streets, etc. But then joining everything back together results in the wrong records, because you can only join on id and in a long format this ID is replicated. I have also been looking into Reshape2, but that results in the same issue.
All the conversions of wide to long I have seen are when you have one column you want to convert to. I'm looking for the end result in 10 columns, or as in the example 2 columns.
Is there a function that I'm overlooking?
#code to generete the dataframes:
df <- data.frame(id = c(1,2,3),
name1 = c("John", "Jack", "Joey"),
adress1 = c("street a", "street b", NA),
name2 = c("Burt", "Ben", "Bob"),
adress2 = c("street d", "street e", "street f"),
name3 = c("chris", "connor", NA),
adress3 = c("street 1", "street 2", NA),
stringsAsFactors = FALSE)
expecteddf <- data.frame(id = c(1,2,3,1,2,3,1,2),
origin = c(rep(1, 3), rep(2, 3), rep(3, 2)),
names = c("John", "Jack", "Joey", "Burt", "Ben", "Bob", "chris", "connor"),
adresses = c("street a", "street b", NA, "street d", "street e", "street f", "street 1", "street 2"),
stringsAsFactors = FALSE
)
回答1:
We could use melt
from the devel
version of data.table which can take multiple patterns
for the measure
columns. Instructions to install the devel version of 'data.table' is here
We convert the 'data.frame' to 'data.table' (setDT(df)
), melt
, and specify the regex
in the patterns
of measure
argument. Remove the rows that are NA
for the 'names' and 'address' column.
library(data.table)#v1.9.5+
dM <- melt(setDT(df), measure=patterns(c('^name', '^adress')),
value.name=c('names', 'address') )
dM[!(is.na(names) & is.na(address))]
# id variable names address
#1: 1 1 John street a
#2: 2 1 Jack street b
#3: 3 1 Joey NA
#4: 1 2 Burt street d
#5: 2 2 Ben street e
#6: 3 2 Bob street f
#7: 1 3 chris street 1
#8: 2 3 connor street 2
Or we can use reshape
from base R
.
dM2 <- reshape(df, idvar='id', varying=list(grep('name', names(df)),
grep('adress', names(df))), direction='long')
The NA
rows can be removed as in the data.table
solution by using standard 'data.frame' indexing after we create the logical index with is.na
.
来源:https://stackoverflow.com/questions/32297690/from-wide-format-to-long-format-with-results-in-multiple-columns