问题
I wrote a function that selects first nth percent of rows (i.e., threshold) from dataframe and this works on dataframes of list as well. The functions is given below:
set.threshold.rand <-function(value, vector){
print(length(vector))
n<-as.integer(length(vector)/100*value)
threshold<-vector[n]
return(threshold)
}
sensitivity.rand<-function(vector, threshold){
thresh<-set.threshold.rand(threshold, vector)
print(thresh)
score<-ifelse(vector<=thresh, "H", "L") # after taking the threshold values it assign them to 'H' and 'L' according to condition
return(score)
}
This function selects first nth percent of rows from dataframes of list. For example, the codes below selects first 143 rows as "H" which was expected.
vec.1 <- c(1:574)
vec.2 <- c(3001:3574)
df.1 <- data.frame(vec.1, vec.2)
df.2 <- data.frame(vec.2, vec.1)
my_list1 <- list(df.1, df.2)
my_list1 <- lapply(my_list1, function(x) {x[1] <- lapply(x[1], sensitivity.rand, threshold = 25)
x})
But this don't work on sampled and replicated dataframes of list (given below). For example:
my_list <- replicate(10, df.1[sample(nrow(df.1)),] , simplify = FALSE)
my_list <- lapply(my_list, function(x) {x[1] <- lapply(x[1], sensitivity.rand, threshold = 25)
x})
These select more than 300 number of rows. How to solve this?
回答1:
Your function set.threshold.rand
relies on the fact that the input vector is sorted.
That's why it works with my_list1
and not with my_list
, where you've shuffled the rows with sample()
.
Replace threshold <- vector[n]
with threshold <- sort(vector)[n]
in set.threshold.rand
回答2:
Adapted from answer given by #SirSaleh here:
sensitivity.rand <- function(vector, threshold){
num_to_thres <- floor(threshold*0.01*length(vector))
l = length (vector)
score = c(rep("H",num_to_thres),rep("L",l-num_to_thres))
return(score)
}
Now it can take any threshold and works with great efficacy.
来源:https://stackoverflow.com/questions/43175618/select-first-nth-percent-of-rows-from-random-sampled-dataframes-of-list-in-r