Table by row with R

拈花ヽ惹草 提交于 2019-12-31 03:02:30

问题


I would like to tabulate by row within a data frame. I can obtain adequate results using table within apply in the following example:

df.1 <- read.table(text = '
  state  county  city  year1  year2  year3  year4  year5
      1       2     4      0      0      0      1      2
      2       5     3     10     20     10     NA     10
      2       7     1    200    200     NA     NA    200
      3       1     1     NA     NA     NA     NA     NA
', na.strings = "NA", header=TRUE)

tdf <- t(df.1)
apply(tdf[4:nrow(tdf),1:nrow(df.1)], 2, function(x) {table(x, useNA = "ifany")})

Here are the results:

[[1]]
x
0 1 2 
3 1 1 

[[2]]
x
  10   20 <NA> 
   3    1    1 

[[3]]
x
 200 <NA> 
   3    2 

[[4]]
x
<NA> 
   5 

However, in the following example each row consists of a single value.

df.2 <- read.table(text = '
  state  county  city  year1  year2  year3  year4  year5
      1       2     4      0      0      0      0      0
      2       5     3      1      1      1      1      1
      2       7     1      2      2      2      2      2
      3       1     1     NA     NA     NA     NA     NA
', na.strings = "NA", header=TRUE)

tdf.2 <- t(df.2)
apply(tdf.2[4:nrow(tdf.2),1:nrow(df.2)], 2, function(x) {table(x, useNA = "ifany")})

The output I obtain is:

# [1] 5 5 5 5

As such, I cannot tell from this output that the first 5 is for 0, the second 5 is for 1, the third 5 is for 2 and the last 5 is for NA. Is there a way I can have R return the value represented by each 5 in the second example?


回答1:


Here's a table solution:

table(
    rep(rownames(df.1),5),
    unlist(df.1[,4:8]),
    useNA="ifany")

This gives

    0 1 2 10 20 200 <NA>
  1 3 1 1  0  0   0    0
  2 0 0 0  3  1   0    1
  3 0 0 0  0  0   3    2
  4 0 0 0  0  0   0    5

...and for your df.2:

    0 1 2 <NA>
  1 5 0 0    0
  2 0 5 0    0
  3 0 0 5    0
  4 0 0 0    5

Well, this is a solution unless you really like having a list of tables for some reason.




回答2:


You can use lapply to systematically output a list. You would have to loop over the row indices:

sub.df <- as.matrix(df.2[grepl("year", names(df.2))])
lapply(seq_len(nrow(sub.df)), 
       function(i)table(sub.df[i, ], useNA = "ifany"))



回答3:


Protect the result by wrapping with list:

apply(tdf.2[4:nrow(tdf.2),1:nrow(df.2)], 2, 
              function(x) {list(table(x, useNA = "ifany")) })



回答4:


I think the problem is stated in applys help:

... If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise ...

The inconsistencies of the return values of base R's apply family is the reason why I shifted completely to plyrs **ply functions. So this works as desired:

library(plyr)
alply( df.2[ 4:8 ], 1, function(x) table( unlist(x), useNA = "ifany" ) )


来源:https://stackoverflow.com/questions/16825216/table-by-row-with-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!