How to bootstrap a function with replacement and return the output

假如想象 提交于 2019-12-13 07:02:57

问题


I am trying to take two randomly drawn subsamples from a data frame, extract the means of a column in the subsamples and calculate the difference between means. The below function and use of replicate within do.call should work as far as I can tell, but I keep getting an error message:

Example data:

> dput(a)
structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L, 
35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L, 
36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), id = c(1L, 
2L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
14L, 15L, 16L, 16L, 17L, 18L, 19L, 20L, 21L, 21L, 22L, 23L, 24L, 
25L)), .Names = c("index", "val", "id"), class = "data.frame", row.names = c(NA, 
-30L))

Code:

# Function to select only one row for each unique id in the data frame, 
# take 2 randomly drawn subsets of size 40 from this unique dataset, 
# calculate means of both subsets and determine the difference between the two means
extractDiff <- function(P){
   xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame
  subA <- xA[sample(xA, 10, replace=TRUE), ] # takes a random sample of 40 rows
  subB <- xA[sample(xA, 10, replace=TRUE), ] # takes a second random sample of 40 rows
  meanA <- mean(subA$val)
  meanB <- mean(subB$val)
  diff <- abs(meanA-meanB)
  outdf <- c(mA = meanA, mB= meanB, diffAB = diff)
  return(outdf)
}

# To repeat the random selections and mean comparison X number of times...
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE))

Error message:

 Error in xj[i] : invalid subscript type 'list'

I think that the error is something to do with not returning the function output in a format that can be fed to rbind, but nothing I try seems to work (i.e. I have tried converting the outdf object to a data frame and matrix and still get the error moessage).

I am still learning R so would be grateful for any help. Thanks!


回答1:


If you pass sample a list/data.frame as the first argument it will return a list/data.frame. You can't use a data.frame for subsetting a data.frame.

library(plyr)
extractDiff <- function(P){
  xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame
  subA <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a random sample of 40 rows
  subB <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a second random sample of 40 rows
  meanA <- mean(subA$val)
  meanB <- mean(subB$val)
  diff <- abs(meanA-meanB)
  outdf <- c(mA = meanA, mB= meanB, diffAB = diff)
  return(outdf)
}

set.seed(42)
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE))
#         mA   mB diffAB
#  [1,] 29.4 25.5    3.9
#  [2,] 25.8 23.0    2.8
#  [3,] 25.3 29.5    4.2
#  [4,] 29.0 31.2    2.2
#  [5,] 26.5 25.6    0.9
#  [6,] 26.8 27.2    0.4
#  [7,] 28.7 27.3    1.4
#  [8,] 22.7 28.7    6.0
#  [9,] 30.6 23.2    7.4
# [10,] 25.1 25.2    0.1


来源:https://stackoverflow.com/questions/24058491/how-to-bootstrap-a-function-with-replacement-and-return-the-output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!