问题
How do I find the minimum value from an R data table other than a particular value?
For example, there could be zeroes in the data table and the goal would be to find the minimum non zero value.
I tried using the
sapply
withmin
, but am not sure how to specify the extra criteria that we have so that the minimum is not equal to a certain value.More generally, How do we find the minimum from a data table not equal to any element from a list of possible values?
回答1:
If you want to find the minimum value from a vector while excluding certain values from that vector, then you can use %in%
:
v <- c(1:10) # values 1 .. 10
v.exclude <- c(1, 2) # exclude the values 1 and 2 from consideration
min.exclude <- min(v[!v %in% v.exclude])
The logic won't change much if you are using a column from a data table/frame. In this case you can just replace the vector v
with the apropriate column. If you have your excluded values in a list, then you can flatten it to produce your v.exclude
vector.
回答2:
This can be done with data.table
(as the OP mentioned about data table in the post) after setting the key
library(data.table)
setDT(df, key='a')[!.(exclude)]
# a b
#1: 4 40
#2: 5 50
#3: 6 60
If we need the min
value of 'a'
min(setDT(df, key='a')[!.(exclude)]$a)
#[1] 4
For finding the min
in all the columns (using the setkey
method), we loop over the columns of the dataset, set the key as each of the column, subset the dataset, get the min
value in a previously created list
object.
setDT(df)
MinVal <- vector('list', length(df))
for(j in seq_along(df)){
setkeyv(df, names(df)[j])
MinVal[[j]] <- min(df[!.(exclude)][[j]])
}
MinVal
#[[1]]
#[1] 4
#[[2]]
#[1] 10
data
df <- data.frame(a = c(0,2,3,2,1,2,3,4,5,6),
b = c(10,10,20,20,30,30,40,40,50,60))
exclude <- c(0,1,2,3)
回答3:
Assuming you are working with a data.frame
Data
df <- data.frame(a = c(0,2,3,2,1,2,3,4,5,6),
b = c(10,10,20,20,30,30,40,40,50,60))
Values to exlude from our minimum search
exclude <- c(0,1,2,3)
we can find the minimum value from column a
excluding our exclude
vector
## minimum from column a
min(df[!df$a %in% exclude,]$a)
# [1] 4
Or from b
exclude <- c(10, 20, 30, 40)
min(df[!df$b %in% exclude,]$b)
# [1] 50
To return the row that corresponds to the minimum value
df[df$b == min( df[ !df$b %in% exclude, ]$b ),]
# a b
# 9 5 50
Update
To find the minimum across multiple rows we can do it this way:
## values to exclude
exclude_a <- c(0,1)
exclude_b <- c(10)
## exclude rows/values from each column we don't want
df2 <- df[!(df$a %in% exclude_a) & !(df$b %in% exclude_b),]
## order the data
df3 <- df2[with(df2, order(a,b)),]
## take the first row
df3[1,]
# > df3[1,]
# a b
#4 2 20
Update 2
To select from multiple columns we can iterate over them as @akrun has shown, or alternatively we can construct our subsetting formula using an expression
and eval
uate it inside our [
operation
exclude <- c(0,1,2, 10)
## construct a formula/expression using the column names
n <- names(df)
expr <- paste0("(", paste0(" !(df$", n, " %in% exclude) ", collapse = "&") ,")")
# [1] "( !(df$a %in% exclude) & !(df$b %in% exclude) )"
expr <- parse(text=expr)
df2 <- df[eval(expr),]
## order and select first row as before
df2 <- df2[with(df2, order(a,b)),]
df2 <- df2[1,]
And if we wanted to use data.table
for this:
library(data.table)
setDT(df)[ eval(expr) ][order(a, b),][1,]
comparison of methods
library(microbenchmark)
fun_1 <- function(x){
df2 <- x[eval(expr),]
## order and select first row as before
df2 <- df2[with(df2, order(a,b)),]
df2 <- df2[1,]
return(df2)
}
fun_2 <- function(x){
df2 <- setDT(x)[ eval(expr) ][order(a, b),][1,]
return(df2)
}
## including @akrun's solution
fun_3 <- function(x){
setDT(df)
MinVal <- vector('list', length(df))
for(j in seq_along(df)){
setkeyv(df, names(df)[j])
MinVal[[j]] <- min(df[!.(exclude)][[j]])
}
return(MinVal)
}
microbenchmark(fun_1(df), fun_2(df), fun_3(df) , times=1000)
# Unit: microseconds
# expr min lq mean median uq max neval
# fun_1(df) 770.376 804.5715 866.3499 833.071 869.2195 2728.740 1000
# fun_2(df) 854.862 893.1220 952.1207 925.200 962.6820 3115.119 1000
# fun_3(df) 1108.316 1148.3340 1233.1268 1186.938 1234.3570 5400.544 1000
来源:https://stackoverflow.com/questions/35099845/r-minimum-value-from-datatable-not-equal-to-a-particular-value