Count number of rows matching a criteria

后端未结

关注

 8  879

I am looking for a command in R which is equivalent of this SQL statement. I want this to be a very simple basic solution without using complex functions OR dplyr type of pa

相关标签:

8条回答

借酒劲吻你

2020-11-29 05:25
sum is used to add elements; nrow is used to count the number of rows in a rectangular array (typically a matrix or data.frame); length is used to count the number of elements in a vector. You need to apply these functions correctly.

Let's assume your data is a data frame named "dat". Correct solutions:
```
nrow(dat[dat$sCode == "CA",])
length(dat$sCode[dat$sCode == "CA"])
sum(dat$sCode == "CA")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-11-29 05:28
Call nrow passing as argument the name of the dataset:
```
nrow(dataset)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-11-29 05:29
Just give a try using subset
```
nrow(subset(data,condition))
```
Example
```
nrow(subset(myData,sCode == "CA"))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2020-11-29 05:31
mydata$sCode == "CA" will return a boolean array, with a TRUE value everywhere that the condition is met. To illustrate:
```
> mydata = data.frame(sCode = c("CA", "CA", "AC"))
> mydata$sCode == "CA"
[1]  TRUE  TRUE FALSE
```
There are a couple of ways to deal with this:
1. sum(mydata$sCode == "CA"), as suggested in the comments; because TRUE is interpreted as 1 and FALSE as 0, this should return the numer of TRUE values in your vector.
2. length(which(mydata$sCode == "CA")); the which() function returns a vector of the indices where the condition is met, the length of which is the count of "CA".
Edit to expand upon what's happening in #2:
```
> which(mydata$sCode == "CA")
[1] 1 2
```
which() returns a vector identify each column where the condition is met (in this case, columns 1 and 2 of the dataframe). The length() of this vector is the number of occurences.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-11-29 05:31
With dplyr package, Use
```
 nrow(filter(mydata, sCode == "CA")),
```
All the solutions provided here gave me same error as multi-sam but that one worked.
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-11-29 05:37
1. mydata$sCode is a vector, it's why nrow output is NULL.
2. mydata[mydata$sCode == 'CA',] returns data.frame where sCode == 'CA'. sCode includes character. That's why sum gives you the error.
3. subset(mydata, sCode='CA', select=c(sCode)), you should use sCode=='CA' instead sCode='CA'. Then subset returns you vector where sCode equals CA, so you should use
  
  length(subset(na.omit(mydata), sCode='CA', select=c(sCode)))
Or you can try this: sum(na.omit(mydata$sCode) == "CA")
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页