R Minimum Value from Datatable Not Equal to a Particular Value

一笑奈何 提交于 2019-12-02 07:54:02

问题


  1. How do I find the minimum value from an R data table other than a particular value?

    For example, there could be zeroes in the data table and the goal would be to find the minimum non zero value.

    I tried using the sapply with min, but am not sure how to specify the extra criteria that we have so that the minimum is not equal to a certain value.

  2. More generally, How do we find the minimum from a data table not equal to any element from a list of possible values?


回答1:


If you want to find the minimum value from a vector while excluding certain values from that vector, then you can use %in%:

v <- c(1:10)           # values 1 .. 10
v.exclude <- c(1, 2)   # exclude the values 1 and 2 from consideration
min.exclude <- min(v[!v %in% v.exclude])

The logic won't change much if you are using a column from a data table/frame. In this case you can just replace the vector v with the apropriate column. If you have your excluded values in a list, then you can flatten it to produce your v.exclude vector.




回答2:


This can be done with data.table (as the OP mentioned about data table in the post) after setting the key

library(data.table)
setDT(df, key='a')[!.(exclude)]
#   a  b
#1: 4 40
#2: 5 50
#3: 6 60

If we need the min value of 'a'

min(setDT(df, key='a')[!.(exclude)]$a)
#[1] 4

For finding the min in all the columns (using the setkey method), we loop over the columns of the dataset, set the key as each of the column, subset the dataset, get the min value in a previously created list object.

setDT(df)
MinVal <- vector('list', length(df))
for(j in seq_along(df)){
 setkeyv(df, names(df)[j])
 MinVal[[j]] <- min(df[!.(exclude)][[j]])
}

MinVal
#[[1]]
#[1] 4

#[[2]]
#[1] 10

data

df <- data.frame(a = c(0,2,3,2,1,2,3,4,5,6),
             b = c(10,10,20,20,30,30,40,40,50,60))
exclude <- c(0,1,2,3)



回答3:


Assuming you are working with a data.frame

Data

df <- data.frame(a = c(0,2,3,2,1,2,3,4,5,6),
                 b = c(10,10,20,20,30,30,40,40,50,60))

Values to exlude from our minimum search

exclude <- c(0,1,2,3)

we can find the minimum value from column a excluding our exclude vector

## minimum from column a
min(df[!df$a %in% exclude,]$a)
# [1] 4

Or from b

exclude <- c(10, 20, 30, 40)
min(df[!df$b %in% exclude,]$b)
# [1] 50

To return the row that corresponds to the minimum value

df[df$b == min( df[ !df$b %in% exclude, ]$b ),]
#   a  b
# 9 5 50

Update

To find the minimum across multiple rows we can do it this way:

## values to exclude
exclude_a <- c(0,1)
exclude_b <- c(10)

## exclude rows/values from each column we don't want
df2 <- df[!(df$a %in% exclude_a) & !(df$b %in% exclude_b),]

## order the data 
df3 <- df2[with(df2, order(a,b)),]

## take the first row
df3[1,]
# > df3[1,]
# a  b
#4 2 20

Update 2

To select from multiple columns we can iterate over them as @akrun has shown, or alternatively we can construct our subsetting formula using an expression and evaluate it inside our [ operation

exclude <- c(0,1,2, 10)

## construct a formula/expression using the column names
n <- names(df)
expr <- paste0("(", paste0(" !(df$", n, " %in% exclude) ", collapse = "&") ,")")
# [1] "( !(df$a %in% exclude) & !(df$b %in% exclude) )"
expr <- parse(text=expr)

df2 <- df[eval(expr),]

## order and select first row as before
df2 <- df2[with(df2, order(a,b)),]
df2 <- df2[1,]

And if we wanted to use data.table for this:

library(data.table)
setDT(df)[ eval(expr) ][order(a, b),][1,]

comparison of methods

library(microbenchmark)

fun_1 <- function(x){
  df2 <- x[eval(expr),]

  ## order and select first row as before
  df2 <- df2[with(df2, order(a,b)),]
  df2 <- df2[1,]
  return(df2)
}

fun_2 <- function(x){
  df2 <- setDT(x)[ eval(expr) ][order(a, b),][1,]
  return(df2)
}

## including @akrun's solution
fun_3 <- function(x){
  setDT(df)
  MinVal <- vector('list', length(df))
  for(j in seq_along(df)){
    setkeyv(df, names(df)[j])
    MinVal[[j]] <- min(df[!.(exclude)][[j]])
  }
  return(MinVal)
}

microbenchmark(fun_1(df), fun_2(df), fun_3(df) , times=1000)
 # Unit: microseconds
 #     expr      min        lq      mean   median        uq      max neval
 # fun_1(df)  770.376  804.5715  866.3499  833.071  869.2195 2728.740  1000
 # fun_2(df)  854.862  893.1220  952.1207  925.200  962.6820 3115.119  1000
 # fun_3(df) 1108.316 1148.3340 1233.1268 1186.938 1234.3570 5400.544  1000


来源:https://stackoverflow.com/questions/35099845/r-minimum-value-from-datatable-not-equal-to-a-particular-value

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!