问题
Could someone please explain to me the difference in column referencing between matrix
, data.frame
, and data.table
? I'm getting my head around which syntax to use for each class, but I don't understand how/why they're different.
Take a 10x10 matrix
foo <- matrix( nrow = 10, ncol = 10 )
I'll just fill the 2nd column to demonstrate:
foo[,2] <- rnorm(10)
head( foo, 3 )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA -0.4688874 NA NA NA NA NA NA NA NA
[2,] NA -1.0273370 NA NA NA NA NA NA NA NA
[3,] NA -0.3981627 NA NA NA NA NA NA NA NA
Now I can reference the 2nd column with foo[,2]
, but foo[[2]]
returns only 1 cell, which in this case is NA:
foo[,2]
[1] 0.18340527 0.46511236 -2.43277107 0.13260218 0.20227436 -0.57518392 -0.62211864 2.00239088 -0.09561907 0.67536428
foo[[2]]
[1] NA
If I change the matrix to a dataframe, both referencing methods work:
foo <- data.frame( foo )
foo[,2]
[1] -0.4688874 -1.0273370 -0.3981627 -0.2207062 0.5711004 1.1085851 -1.3343338 0.2337622 -1.0632469 -0.9783714
foo[[2]]
[1] -0.4688874 -1.0273370 -0.3981627 -0.2207062 0.5711004 1.1085851 -1.3343338 0.2337622 -1.0632469 -0.9783714
Now if I convert to a data.table
only the second method works, and the first method returns the value 2 (which isn't in the table at all):
foo[,2]
[1] 2
foo[[2]]
[1] -0.4688874 -1.0273370 -0.3981627 -0.2207062 0.5711004 1.1085851 -1.3343338 0.2337622 -1.0632469 -0.9783714
So my question is, why the different syntax for different classes? And is there a particular syntax that would work for all 3 classes, or do we need to know/check the tabular class before knowing how to call a reference?
EDIT: also interesting here is that row referencing is more consistent across classes.
For matrix, dataframe, and data.table respectively:
foo[2,]
[1] NA 0.4651124 NA NA NA NA NA NA NA NA
foo <- data.frame( foo )
foo[2,]
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
2 NA 0.4651124 NA NA NA NA NA NA NA NA
setDT( foo )
foo[2,]
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1: NA 0.4651124 NA NA NA NA NA NA NA NA
回答1:
A few things I've learned since posting this question here (with a lot of help from the commenters!), which have helped me to understand the differences I mentioned. If anyone could please clarify anything I'm getting wrong here, I'd appreciate it:
Objects of class
matrix
,data.frame
, anddata.table
are alllist
objects under the hood, but they differ in an important way.Each column of a
data.frame
ordata.table
object is an element of the list "under its hood", meaning that a column can be extracted in the same way as a list element would be, hencefoo[[2]]
works great for calling the second column in both of those classes.A
matrix
differs in that every cell is an element of the list, meaning thatfoo[[2]]
will only retrieve one cell, rather than a column (which brings us to...).Those list items making up the
matrix
are "filled" column-wise (top-to-bottom, left-to-right), so the callfoo[[2]]
is retrieving the second item, which here resides in row 2 of column 1.Since the
matrix
does also have dimensions,foo[,2]
is accepted as referring to the second column, as it does for adata.frame
object.A
data.table
object (up until recently, see next point) didn't have a particularly logical response to the callfoo[,2]
, and returned the value2
regardless of the data to which it was referring, for no good reason I can find.As of very recent updates to the
data.table
package (as of 1.9.8 I think? Thank you maintainers!) the syntaxfoo[,2]
is now logically accepted as per adata.frame
, so some of the confusion which lead to my question has been superseded!
So in conclusion:
All of the objects I mentioned in my question are actually lists (which means I now get @N8TRO's joke in the comments, to which I was naively oblivious before), with both
data.table
anddata.frame
containing a list element for each column, and amatrix
containing a list element for each cell (this makes the[[
call make sense to me now).All of the objects mentioned have dimensions, which means (as of recent
data.table
package updates) thefoo[,2]
syntax works the same for all 3 classes. YAY!
Thank you so much to the commenters for pointing me in the right direction (and making jokes that I now get). I hope this might help someone in the future who comes across the same confusion I did.
来源:https://stackoverflow.com/questions/38263734/column-referencing-i-vs-i-for-matrix-dataframe-and-data-table