问题
Looking for help writing a function to automatically subset data frames based on the value of a column? For example,
df$x contains values a, b, c, d
I want to make separate data frames named a, b, c, d that contain all values x == 'a', or x == 'b', etc. I know several methods to do this manually but am hoping for guidance on how to automate this? Thank you!
回答1:
maybe not the best way to do it, but will get the job done.
vars_df = unique(df$x)
for (i in 1:length(vars_df)) {
assign(paste0(vars_df[i]), df %>% filter(x == vars_df[i]), envir = .GlobalEnv)
}
回答2:
The split
function returns a list of subsetted data frames:
split(df, df$x)
EDIT:
If you want a new object for each subsetted data frame:
for (i in levels(df$x)) {
command <- paste0(i, "<-subset(df, x=='", i, "')")
eval(parse(text=command))
}
EDIT 2:
To split by two or more variables, a more automated solution would be to create a function that takes as input a data frame and column names with which to subset the dataframe:
create_new_df <- function (dataframe, vars) {
# Creates a new data frame in the global environment based on names of variables in 'vars'
split(dataframe, as.list(dataframe[, vars]), drop = TRUE) %>%
lapply(function (subset_dataframe) {
new_object_name <- paste(as.character(subset_dataframe[1, vars])
# The double arrowed '<<-' creates a new object in the global environment
command <- paste0(new_object_name, collapse="_"), "<<-subset_dataframe")
eval(parse(text=command))
}) %>%
invisible()
}
This function can then be used to create new objects with any combination of variables:
variables <- c("x", "y", "z")
create_new_df(df, variables)
来源:https://stackoverflow.com/questions/44422397/automatically-subset-data-frame-by-factor