问题
I have data that looks like:
Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row7
abc89 abc62 67 abc513 abc512 abc81 abc10
abc6 pop abc11 abc4 giant 13 abc15
abc90 abc16 abc123 abc33 abc22 abc08 9
111 abc15 abc72 abc36 abc57 abc9 abc55
I would like to calculate the percentage of cells in the data frame that begin with "abc". For example: there are 28 total cells here. This can be gotten by prod(dim(df))
. So I need the # of cells that start with "abc" and then divide it by prod(dim(df))
. Here the answer would be 0.785. how can this be done in R?
回答1:
I would use:
> mean(grepl("^abc",unlist(dat)))
[1] 0.7857143
Using mean
means you don't have to get the numerator and denominator yourself separately. grepl
is the logical version of grep
--it returns TRUE
whenever "^abc"
(i.e., a string beginning with abc
) is found. Recall that the average of a Bernoulli vector is the percentage of successes.
If you wanted to do this by row or by column you'd use apply
, e.g. apply(dat,1,function(x)mean(grepl("^abc",x)))
to get the row-wise means.
回答2:
You can use grep
to search for the pattern of interest (a string starting with "abc"):
length(grep("^abc", as.character(unlist(dat)))) / prod(dim(dat))
# [1] 0.7857143
You can get row counts with:
(row.counts <- apply(dat, 1, function(x) length(grep("^abc", as.character(x)))))
# [1] 6 4 6 6
Data:
dat = read.table(text="Row1 Row2 Row3 Row4 Row5 Row6 Row7
abc89 abc62 67 abc513 abc512 abc81 abc10
abc6 pop abc11 abc4 giant 13 abc15
abc90 abc16 abc123 abc33 abc22 abc08 9
111 abc15 abc72 abc36 abc57 abc9 abc55", header=TRUE)
来源:https://stackoverflow.com/questions/31775978/how-to-calculate-percentage-of-cells-in-data-frame-that-start-with-sequence-in-r