Random Sample with multiple probabilities in R [duplicate]

陌路散爱 提交于 2019-12-09 23:47:31

问题


I need to get out a sample of subjects from a list to assign them as a Control Group for a study which has to have a similar composition of variables. I am trying to do this in R with the sample function but I don´t know how to specify the differetnt probabilities for each variable. Lets say I have a table with the following headers:

ID Name Campaign Gender

I need a sample of 10 subjects with the following composition of Campaign attributes:

D2D --> 25%

F2F --> 38%

TM --> 17%

WW --> 21%

This means from my data set I have 25% of subjects coming from a Door to Door Campaign (D2D), 38% from a Face to Face Campaign (F2F), etc

And the gender composition is as following:

Male --> 54%

Female --> 46%

When I get a random sample of 10 subjects I need it to have a similar composition.

I have been searching for hours and the closest I was able to get to anything similar was this answer: taking data sample in R but I need to assign more than one probability.

I am sure that this could help anyone who wants to get a representative sample from a Data Set.


回答1:


It sounds like you are interested in taking a random stratified sample. You could do this using the stratsample() function from the survey package.

In the example below, I create some fake data to mimic what you have, then I define a function to take a random proportional stratified random sample, then I apply the function to the fake data.

# example data
ndf <- 1000
df <- data.frame(ID=sample(ndf), Name=sample(ndf), 
    Campaign=sample(c("D2D", "F2F", "TM", "WW"), ndf, prob=c(0.25, 0.38, 0.17, 0.21), replace=TRUE),
    Gender=sample(c("Male", "Female"), ndf, prob=c(0.54, 0.46), replace=TRUE))

# function to take a random proportional stratified sample of size n
rpss <- function(stratum, n) {
    props <- table(stratum)/length(stratum)
    nstrat <- as.vector(round(n*props))
    nstrat[nstrat==0] <- 1
    names(nstrat) <- names(props)
    stratsample(stratum, nstrat)
    }

# take a random proportional stratified sample of size 10
selrows <- rpss(stratum=interaction(df$Campaign, df$Gender, drop=TRUE), n=10)
df[selrows, ]


来源:https://stackoverflow.com/questions/17002101/random-sample-with-multiple-probabilities-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!