问题
I have a data frame that is 200 rows by 6 columns. I am interested in computing the total times that a value in Col A is less than a specific number. The number can be hard coded. I do not know where to begin...
回答1:
For a slightly more complex problem, use the "which" to tell the "sum" where to sum: if DF is the data frame:
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 97 267 6.3 92 7 8
3 97 272 5.7 92 7 9
Example: sum the values of Solar.R (Column 2) where Column1 or Ozone>30 AND Column 4 or Temp>90
sum(DF[which(DF[,1]>30 & DF[,4]>90),2])
回答2:
To count how many values are below some number you could use ?sum
sum( df$columnA < NUMBER )
回答3:
Just using sum on your condition will work. Logical values get converted to 0 for FALSE and 1 for TRUE so summing over a logical tells you how many values are TRUE.
dat <- as.data.frame(matrix(1:36,6,6))
colnames(dat) <- paste0("Col", LETTERS[1:6])
dat$ColA
# [1] 1 2 3 4 5 6
dat$ColA < 3
# [1] TRUE TRUE FALSE FALSE FALSE FALSE
sum(dat$ColA < 3)
# [1] 2
回答4:
While the answer sum( df$columnA < NUMBER ) is correct it might be better to expand on it a little.
Say if you'd like to sum the values instead of counting you could use:
sum(df[df$columnA < Number,]$columnA)
Or if there is NA values use:
sum(df[df$columnA < Number,]$columnA, na.rm=TRUE)sum(df[(df$columnA < Number)&(!is.na(df$columnA)),]$columnA)
Basically what happens there is that you create a boolean vector of columnA which has TRUE/FALSE based on your conditional. Then you're taking a subset of the original dataframe and using it in this case to do summation of columnA.
Here's an example you can use to try it out:
df = data.frame(colA=c(1, 2, 3, 4, NA), colB=c('a', NA, 'c', 'd', 'e'))
# Count
sum(df$colA) # NA
sum(df$colA, na.rm=TRUE) # 10 This is actually sum of values since colA wasn't turned into vector of booleans
sum(df$colA > 0, na.rm=TRUE) # 4
sum(df$colA > 2, na.rm=TRUE) # 2
sum((df$colA > 2) & (df$colB == 'd'), na.rm=TRUE) # 1
# Sum of values
sum(df$colA, na.rm=TRUE) # 10
sum(df[df$colA > 0,]$colA, na.rm=TRUE) # 10
sum(df[df$colA > 2,]$colA, na.rm=TRUE) # 7
bn_vector = (df$colA > 2)&(df$colB=='d') # Boolean vector
sub_df = df[bn_vector,] # Subset of the dataframe. Leaving the second argument in [] empty uses all the columns
sub_df_colA = df[bn_vector, 'colA'] # Content of column 'colA' which is vector of numbers
sum(sub_df$colA) # 4
sum(sub_df_colA) # 4
回答5:
Ozone<-c(41,97,97)
Solar.R<-c(190,267,272)
Wind<-c(7.4,6.3,5.7)
Temp<-c(67,92,92)
Month<-c(5,7,7)
Day<-c(1,8,9)
tbl<-data.frame(Ozone,Solar, Wind , Temp,Month, Day)
tbl
Ozone | Solar.R | Wind | Temp | Month | Day 1 41 | 190 | 7.4 | 67 | 5 | 1 2 97 | 267 | 6.3 | 92 | 7 | 8 3 97 | 272 | 5.7 | 92 | 7 | 9
sum(tbl$Temp) / sum(!is.na(tbl$Temp))
[1] 84
来源:https://stackoverflow.com/questions/10827705/conditional-sum-in-r