This question already has an answer here:
I have the following structure:
key | category_x | 2009 | category_y | 2010
test
example data as requested
set.seed(24)
df <- data.frame(
key = 1:10,
category_x = paste0("stock_", 0:9),
'2008' = rnorm(10, 0, 10),
category_y = paste0("stock_", 0:9),
'2009' = rnorm(10, 0, 10),
category_z = paste0("stock_", 0:9),
'2010' = rnorm(10, 0, 10),
check.names=FALSE
)
how do I change that into:
key | category | year
I know I can use:
library(magrittr)
library(dplyr)
library(tidyr)
data %>% gather(key, category, starts_with("category_"))
but that doesn't deal with the year. I looked at Gather multiple sets of columns
but I don't get the extract spread commands.
If we are using gather
, we can do this in two steps. First, we reshape from 'wide' to 'long' format for the column names that starts with 'category' and in the next step, we do the same with the numeric column names by selecting with matches
. The matches
can regex patterns, so a pattern of ^[0-9]+$
means we match one or more numbers ([0-9]+
) from the start (^
) to the end ($
) of string. We can remove the columns that are not needed with select
.
library(tidyr)
library(dplyr)
gather(df, key, category, starts_with('category_')) %>%
gather(key2, year, matches('^[0-9]+$')) %>%
select(-starts_with('key'))
Or using the devel version of data.table
, this would be much easier as the melt
can take multiple patterns for measure
columns. We convert the 'data.frame' to 'data.table' (setDT(df)
), use melt
and specify the patterns
with in the measure
argument. We also have options to change the column names of the 'value' column. The 'variable' column is set to NULL as it was not needed in the expected output.
library(data.table)#v1.9.5+
melt(setDT(df), measure=patterns(c('^category', '^[0-9]+$')),
value.name=c('category', 'year'))[, variable:=NULL][]
来源:https://stackoverflow.com/questions/32228220/how-do-i-gather-2-sets-of-columns-in-tidyr