R remove rows from panel while keeping the panel balanced

后端 未结 4 677
面向向阳花
面向向阳花 2021-01-01 01:29

Is there an elegant way to balance an unbalanced panel data set? I would like to start with an unbalanced panel (ie, some individuals are missing some data) and end up with

4条回答
  •  鱼传尺愫
    2021-01-01 01:54

    A solution I've used is to temporarily reshape the data frame into wide format with years as columns and units as rows, and then check for complete cases by row. This is easiest to do if you have a single variable of interest that--if missing--means the entire observation is missing.

    I use the following libraries:

    library(data.table)
    library(reshape2)
    

    First, take a subset of your main data frame (unbal) that is just, the ID variable ("NAME"), the time variable ("YEAR"), and a variable of interest ("X" or "Y").

    df<- unbal[c("NAME", "YEAR", "X" )]
    

    Second, reshape the new data frame to make it wide format. This makes a data frame in which each "NAME" is a single row, and "X" for each year is a column.

    df <- dcast(df, NAME ~ YEAR, value.var = "X")
    

    Third, run complete.cases for each row. Any NAME with missing data will be entirely removed.

    df <- df[complete.cases(df),]
    

    Fourth, reshape the data frame back into long format (by default, this gives your variables generic names, so you may want to change the names back to what they were before).

    df <- melt(df, id.vars = "ID")
    setnames(df, "variable", "YEAR")
    

    NOTE: YEAR becomes a factor variable by default using the approach. If your YEAR variable is numeric, you'll want to change the variable accordingly. For example:

    test4$year <- as.character(test4$year)
    test4$year <- as.numeric(test4$year)
    

    Fifth and sixth, take only the "NAME" and "YEAR" variables in the data frame you created, and then merge it with your original data frame (and be sure to drop cases in the original data frame that aren't found in the d data frame you created)

    df <- df[c("NAME", "YEAR")]
    balanced <- merge.data.frame(df, unbal, by = c("NAME", "YEAR"), all.x = TRUE)
    

提交回复
热议问题