Between/within standard deviations

后端 未结 2 1307
滥情空心
滥情空心 2021-01-03 01:02

When working on a hierarchical/multilevel/panel dataset, it may be very useful to adopt a package which returns the within- and between-group standard deviations of the avai

2条回答
  •  不知归路
    2021-01-03 01:41

    I don't know what your stata command should reproduce, but to answer the second part of question about hierarchical structure , it is easy to do this with list. For example, you define a structure like this:

    tree = list(
          "var1" = list(
             "panel" = list(type ='p',mean = 1,sd=0)
             ,"cluster" = list(type = 'c',value = c(5,8,10)))
          ,"var2" = list(
              "panel" = list(type ='p',mean = 2,sd=0.5)
             ,"cluster" = list(type="c",value =c(1,2)))
    )
    

    To create this lapply is convinent to work with list

    tree <- lapply(list('var1','var2'),function(x){ 
      ll <- list(panel= list(type ='p',mean = rnorm(1),sd=0), ## I use symbol here not name
                 cluster= list(type = 'c',value = rnorm(3)))  ## R prefer symbols
    })
    names(tree) <-c('var1','var2')
    

    You can view he structure with str

    str(tree)
    List of 2
     $ var1:List of 2
      ..$ panel  :List of 3
      .. ..$ type: chr "p"
      .. ..$ mean: num 0.284
      .. ..$ sd  : num 0
      ..$ cluster:List of 2
      .. ..$ type : chr "c"
      .. ..$ value: num [1:3] 0.0722 -0.9413 0.6649
     $ var2:List of 2
      ..$ panel  :List of 3
      .. ..$ type: chr "p"
      .. ..$ mean: num -0.144
      .. ..$ sd  : num 0
      ..$ cluster:List of 2
      .. ..$ type : chr "c"
      .. ..$ value: num [1:3] -0.595 -1.795 -0.439
    

    Edit after OP clarification

    I think that package reshape2 is what you want. I will demonstrate this here.

    The idea here is in order to do the multilevel analysis we need to reshape the data.

    First to divide the variables into two groups :identifier and measured variables. library(reshape2) dat.m <- melt(dat,id.vars=c('son_id','mom_id')) ## other columns are measured

    str(dat.m)
    'data.frame':   21 obs. of  4 variables:
     $ son_id  : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 1 2 3 ...
     $ mom_id  : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 3 3 1 1 1 ...
     $ variable: Factor w/ 3 levels "hispanic","mom_smoke",..: 1 1 1 1 1 1 1 2 2 2 ...
     $ value   : num  1 1 1 0 0 0 0 1 0 0 ..
    

    Once your have data in "moten" form , you can "cast" to rearrange it in the shape that you want:

    # mom1 means for all variable
     acast(dat.m,variable~mom_id,mean)
                               1    2      3
    hispanic           1.0000000    0    0.0
    mom_smoke          0.3333333    1    0.5
    son_birthweigth 3943.3333333 4160 2977.5
    # Within-mother variance for birthweigth
    
    acast(dat.m,variable~mom_id,function(x) sum((x-mean(x))^2))
                               1    2    3
    hispanic           0.0000000    0  0.0
    mom_smoke          0.6666667    0  0.5
    son_birthweigth 5066.6666667 3200 12.5
    
    ## overall mean of each variable
    acast(dat.m,variable~.,mean)
    [,1]
    hispanic           0.4285714
    mom_smoke          0.5714286
    son_birthweigth 3729.2857143
    

提交回复
热议问题