select first nth percent of rows from random sampled dataframes of list in r

问题

I wrote a function that selects first nth percent of rows (i.e., threshold) from dataframe and this works on dataframes of list as well. The functions is given below:

set.threshold.rand <-function(value, vector){
  print(length(vector))
  n<-as.integer(length(vector)/100*value)
  threshold<-vector[n]
  return(threshold)
}

sensitivity.rand<-function(vector, threshold){
  thresh<-set.threshold.rand(threshold, vector)
  print(thresh)
  score<-ifelse(vector<=thresh, "H", "L") # after taking the threshold values it assign them to 'H' and 'L' according to condition
  return(score)
}

This function selects first nth percent of rows from dataframes of list. For example, the codes below selects first 143 rows as "H" which was expected.

vec.1 <- c(1:574)
vec.2 <- c(3001:3574)
df.1 <- data.frame(vec.1, vec.2)
df.2 <- data.frame(vec.2, vec.1)

my_list1 <- list(df.1, df.2)
my_list1 <- lapply(my_list1, function(x) {x[1] <- lapply(x[1], sensitivity.rand, threshold = 25) 
x})

But this don't work on sampled and replicated dataframes of list (given below). For example:

my_list <- replicate(10, df.1[sample(nrow(df.1)),] , simplify = FALSE)

my_list <- lapply(my_list, function(x) {x[1] <- lapply(x[1], sensitivity.rand, threshold = 25) 
x})

These select more than 300 number of rows. How to solve this?

回答1:

Your function set.threshold.rand relies on the fact that the input vector is sorted.

That's why it works with my_list1 and not with my_list, where you've shuffled the rows with sample().

Replace threshold <- vector[n] with threshold <- sort(vector)[n] in set.threshold.rand

回答2:

Adapted from answer given by #SirSaleh here:

sensitivity.rand <- function(vector, threshold){
  num_to_thres <- floor(threshold*0.01*length(vector))
  l = length (vector)
  score = c(rep("H",num_to_thres),rep("L",l-num_to_thres))
  return(score)
}

Now it can take any threshold and works with great efficacy.

来源：https://stackoverflow.com/questions/43175618/select-first-nth-percent-of-rows-from-random-sampled-dataframes-of-list-in-r

标签

list

function

dataframe

lapply