Weird case with data tables in R, column names are mixed

旧时模样 提交于 2021-02-16 19:58:32

问题


So I have created this variable that is called mc_split_device inside the datatable called mc_with_devices. However, If I type mc_with_devices$mc_split I get the values of the column mc_split_device while I never created any variable with the name mc_split.


回答1:


See Hadley Wickham's Advanced R:

$ is a shorthand operator, where x$y is equivalent to x[["y", exact = FALSE]]. It’s often used to access variables in a data frame, as in mtcars$cyl or diamonds$carat.

So the exact=FALSE is the reason why $mc_split works despite there not being a column with that exact name.

As an aside, I don't believe mc_with_devices[,.(mc_split)] will work without doublequotes. The following will work:

mc_with_devices[,"mc_split_resp"]




回答2:


It matches the name of the column partially. From ?Extract

names : For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.

Character indices can in some circumstances be partially matched (see pmatch) to the names or dimnames of the object being subsetted

Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $.

Hence, when you do

mtcars$m

You get

#[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
#[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4

which is same as mtcars$mpg

This can be sometimes confusing and if you want to make sure to be notified when such partial matching is done. You can turn on the warning by

options(warnPartialMatchDollar = TRUE)
mtcars$m
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
#[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4

Warning message: In $.data.frame(mtcars, m) : Partial match of 'm' to 'mpg' in data frame




回答3:


According to ?Extract

name - A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.

and exact

exact - Controls possible partial matching of [[ when extracting by a character vector (for most objects, but see under ‘Environments’). The default is no partial matching. Value NA allows partial matching but issues a warning when it occurs. Value FALSE allows partial matching without any warning.

So, when we do

mtcars$m
#[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3
#[27] 26.0 30.4 15.8 19.7 15.0 21.4

mtcars$d
#NULL

Because there are multiple names that starts with 'd'

 names(mtcars)
 #[1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

If we are specific, it does the partial match for the 'disp' column

mtcars$di
#[1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1
#[22] 318.0 304.0 350.0 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0


来源:https://stackoverflow.com/questions/53943939/weird-case-with-data-tables-in-r-column-names-are-mixed

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!