问题
I have a weird problem when estimating a random effects with the plm package in R.
Here is a link to a dput of part of my data: https://pastebin.com/raw/mTdh26dg
My code is:
library(plm)
library(haven)
pmales <- pdata.frame(males_part, index = c("NR", "YEAR"))
random <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + BLACK + HISP + MAR + UNION + RUR + NE + NC + S + factor(YEAR),
data = pmales, model = "random")
The reason I included libary(haven) is that my original data set is a .dta file.
When I run this code I get this error:
Error in is.pbalanced.default(x) :
argument "y" is missing, with no default
The weird thing is that if I start with a clean R session and don't load haven (and the import the data from the dput), I don't get this error. I do get the error if I import from the dput but load haven anyway. I also don't get the error when estimating within or pooling models (even with haven loaded).
Here is my sessionInfo():
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.3
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] haven_2.2.0 plm_2.2-3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 rstudioapi_0.11 Formula_1.2-3 magrittr_1.5 hms_0.5.3 MASS_7.3-51.5 lattice_0.20-41 rlang_0.4.5
[9] bibtex_0.4.2.2 fansi_0.4.1 stringr_1.4.0 tools_3.6.3 grid_3.6.3 nlme_3.1-144 cli_2.0.2 ellipsis_0.3.0
[17] maxLik_1.3-8 miscTools_0.6-26 assertthat_0.2.1 lmtest_0.9-37 digest_0.6.25 lifecycle_0.2.0 tibble_3.0.0 crayon_1.3.4
[25] bdsmatrix_1.3-4 vctrs_0.2.4 Rdpack_0.11-1 gbRd_0.4-11 glue_1.4.0 sandwich_2.5-1 stringi_1.4.6 pillar_1.4.3
[33] compiler_3.6.3 forcats_0.5.0 pkgconfig_2.0.3 zoo_1.8-7
Is this a bug in plm or haven? Or some sort of incompatibility of the two (or their dependencies)?
回答1:
I think the issue is that your data males_part is a tibble, but you don't have the tibble package loaded until you attach haven. If you don't have tibble loaded, then you won't have any methods for the tibble classes "tbl_df" and "tbl", and it will act exactly like a data frame. Once tibble is loaded, it will start to act like a tibble.
This is an issue because tibbles and data frames aren't identical, but the class of a tibble includes "data.frame". I'd guess what's happening is that plm assumes that extracting a single column from a data frame gives a vector, but with a tibble, it gives another tibble.
The workaround for you is pretty simple. Just use males_part <- as.data.frame(males_part) to remove the tibble class, and then haven won't matter.
Conceivably this is worth reporting to the maintainer of plm. It's a design flaw in tibble that is causing the problem (if tibbles inherit from data.frame, they should act like data frames), but tibbles are pretty common nowadays, and that design is unlikely to change. The plm function could protect itself against this by putting data <- as.data.frame(data) early in the pdata.frame function,
or protecting every column extraction with drop = TRUE.
来源:https://stackoverflow.com/questions/61249692/error-when-estimating-random-effects-model-with-plm-package-when-haven-is-loaded