Automatically subset data frame by factor

一个人想着一个人 提交于 2019-12-13 09:25:02

问题


Looking for help writing a function to automatically subset data frames based on the value of a column? For example,

df$x contains values a, b, c, d

I want to make separate data frames named a, b, c, d that contain all values x == 'a', or x == 'b', etc. I know several methods to do this manually but am hoping for guidance on how to automate this? Thank you!


回答1:


maybe not the best way to do it, but will get the job done.

vars_df = unique(df$x)

for (i in 1:length(vars_df)) {
assign(paste0(vars_df[i]), df %>% filter(x == vars_df[i]), envir = .GlobalEnv)
}



回答2:


The split function returns a list of subsetted data frames:

split(df, df$x)

EDIT:

If you want a new object for each subsetted data frame:

for (i in levels(df$x)) {
    command <- paste0(i, "<-subset(df, x=='", i, "')")
    eval(parse(text=command))
}

EDIT 2:

To split by two or more variables, a more automated solution would be to create a function that takes as input a data frame and column names with which to subset the dataframe:

create_new_df <- function (dataframe, vars) {
    # Creates a new data frame in the global environment based on names of variables in 'vars'
    split(dataframe, as.list(dataframe[, vars]), drop = TRUE) %>%
        lapply(function (subset_dataframe) {
            new_object_name <- paste(as.character(subset_dataframe[1, vars])
            # The double arrowed '<<-' creates a new object in the global environment
            command <- paste0(new_object_name, collapse="_"), "<<-subset_dataframe")
            eval(parse(text=command))
        }) %>%
        invisible()
}

This function can then be used to create new objects with any combination of variables:

variables <- c("x", "y", "z")
create_new_df(df, variables)


来源:https://stackoverflow.com/questions/44422397/automatically-subset-data-frame-by-factor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!