I just want to understand if there is a difference between names and colnames when working with data.frame. Both seems to behave the s
As far as I am concerned, the only difference between names() and colnames() with respect to a data.frame input is that they allocated memory slightly differently. For instance, consider the code chunk below:
df <- data.frame(x=1:5, y=6:10, z=11:15)
tracemem(df)
names(df) <- c("A", "B", "C")
colnames(df) <- c('a','b','c')
If you run this code, you will see that the copying of df only occurs once during the names() call, whereas the copying of df occurs twice during the colnames() call.
Are they the same for data.frames? YES
Are they the same in general? Not quite--the big difference is that colnames also works for matrices, whereas names does not (just dataframes).
In addition, you can use names to set/get the names of vectors (and, for obvious reasons, you can't do this with colnames--the result is NULL for getting and an error for setting).
names() creates name attributes where as colnames()simply names the columns.
i.e.
Create a temp variable.
> temp <- rbind(cbind(1,2,3,4,5),
+ cbind(6,7,8,9,10))
> temp
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
Create the names.temp object.
> names.temp <- temp
Use names() on names.temp
> names(names.temp) <- paste(c("First col", "Second col", "Third col",
"Fourth Col", "Fifth col"))
> names.temp
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
attr(,"names")
[1] "First col" "Second col" "Third col" "Fourth Col" "Fifth col"
NA NA NA
[9] NA NA
We see here we can actually call the 5th name attribute in names.temp.
> names(names.temp)[5]
[1] "Fifth col"
Repeat with a second object but this time create the colnames.temp object.
> colnames.temp <- temp
Use colnames() on colnames.temp
> colnames(colnames.temp) <- paste(c("First col", "Second col", "Third col",
"Fourth Col", "Fifth col"))
> colnames.temp
First col Second col Third col Fourth Col Fifth col
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
Now name attribute is NULL.
> names(colnames.temp)[5]
NULL
FINALLY. Let's look at our trusty str() command. We can see there is a structural difference between names.temp and colnames.temp. Specifically, colnames.temp has dimnames attributes not names attributes.
> str(names.temp)
num [1:2, 1:5] 1 6 2 7 3 8 4 9 5 10
- attr(*, "names")= chr [1:10] "First col" "Second col" "Thrid col" "Fourth
Col" ...
> str(colnames.temp)
num [1:2, 1:5] 1 6 2 7 3 8 4 9 5 10
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "First col" "Second col" "Thrid col" "Fourth Col" ...
If you look at the beginning of the colnames and colnames<- functions source code :
R> colnames
function (x, do.NULL = TRUE, prefix = "col")
{
if (is.data.frame(x) && do.NULL)
return(names(x))
(...)
R> `colnames<-`
function (x, value)
{
if (is.data.frame(x)) {
names(x) <- value
}
(...)
You can see that for data frames, colnames just call the names function. So yes, they are strictly equivalent.