Should I use a data.frame or a matrix?

前端 未结 6 1585
我在风中等你
我在风中等你 2020-11-28 00:59

When should one use a data.frame, and when is it better to use a matrix?

Both keep data in a rectangular format, so sometimes it\'s unclear

6条回答
  •  刺人心
    刺人心 (楼主)
    2020-11-28 01:27

    Something not mentioned by @Michal is that not only is a matrix smaller than the equivalent data frame, using matrices can make your code far more efficient than using data frames, often considerably so. That is one reason why internally, a lot of R functions will coerce to matrices data that are in data frames.

    Data frames are often far more convenient; one doesn't always have solely atomic chunks of data lying around.

    Note that you can have a character matrix; you don't just have to have numeric data to build a matrix in R.

    In converting a data frame to a matrix, note that there is a data.matrix() function, which handles factors appropriately by converting them to numeric values based on the internal levels. Coercing via as.matrix() will result in a character matrix if any of the factor labels is non-numeric. Compare:

    > head(as.matrix(data.frame(a = factor(letters), B = factor(LETTERS))))
         a   B  
    [1,] "a" "A"
    [2,] "b" "B"
    [3,] "c" "C"
    [4,] "d" "D"
    [5,] "e" "E"
    [6,] "f" "F"
    > head(data.matrix(data.frame(a = factor(letters), B = factor(LETTERS))))
         a B
    [1,] 1 1
    [2,] 2 2
    [3,] 3 3
    [4,] 4 4
    [5,] 5 5
    [6,] 6 6
    

    I nearly always use a data frame for my data analysis tasks as I often have more than just numeric variables. When I code functions for packages, I almost always coerce to matrix and then format the results back out as a data frame. This is because data frames are convenient.

提交回复
热议问题