How to create a stratified sample by state in R

前端 未结 2 1855
长情又很酷
长情又很酷 2021-01-01 03:27

How can I create a stratified sample in R using the \"sampling\" package? My dataset has 355,000 observations. The code works fine up to the last line. Below is the code I w

2条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-01 04:02

    I had to do something similar last year. If this is something you do a lot, you might want to use a function like the one below. This function lets you specify the name of the data frame you're sampling from, which variable is the ID variable, which is the strata, and if you want to use "set.seed". You can save the function as something like "stratified.R" and load it when you need to. See http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/

    stratified = function(df, group, size) {
      #  USE: * Specify your data frame and grouping variable (as column 
      #         number) as the first two arguments.
      #       * Decide on your sample size. For a sample proportional to the
      #         population, enter "size" as a decimal. For an equal number 
      #         of samples from each group, enter "size" as a whole number.
      #
      #  Example 1: Sample 10% of each group from a data frame named "z",
      #             where the grouping variable is the fourth variable, use:
      # 
      #                 > stratified(z, 4, .1)
      #
      #  Example 2: Sample 5 observations from each group from a data frame
      #             named "z"; grouping variable is the third variable:
      #
      #                 > stratified(z, 3, 5)
      #
      require(sampling)
      temp = df[order(df[group]),]
      if (size < 1) {
        size = ceiling(table(temp[group]) * size)
      } else if (size >= 1) {
        size = rep(size, times=length(table(temp[group])))
      }  
      strat = strata(temp, stratanames = names(temp[group]), 
                     size = size, method = "srswor")
      (dsample = getdata(temp, strat))
    }
    

提交回复
热议问题