Dealing with repetitive tasks in R

后端 未结 3 735
春和景丽
春和景丽 2020-12-28 17:35

I often find myself having to perform repetitive tasks in R. It gets extremely frustrating having to constantly run the same function on one or more data structures over and

相关标签:
3条回答
  • 2020-12-28 18:11

    If the names are similar you could iterate over them using the pattern argument to ls:

    for (i in ls(pattern="df")){
      assign(paste("t",i,sep=""),na.omit(get(i)))
    }
    

    However, a more "R" way of doing it seems to be to use separate environment and eapply:

    # setup environment
    env <- new.env()
    
    # copy dataframes across (using common pattern)
    for (i in ls(pattern="df")){
      asssign(i,get(i),envir=env)
      }
    
    # apply function on environment
    eapply(env,na.omit)
    

    Which yields:

    $df3
         Region variable value
    1      Asia     2006   300
    2    Africa     2006   200
    3    Europe     2006   200
    4 N.America     2006   500
    5 S.America     2006   300
    
    $df2
         Region variable value
    1      Asia     2005    55
    2    Africa     2005   350
    3    Europe     2005    40
    4 N.America     2005    90
    5 S.America     2005    99
    
    $df1
         Region variable value
    1      Asia     2004    35
    2    Africa     2004    20
    3    Europe     2004    20
    4 N.America     2004    50
    5 S.America     2004    30
    

    Unfortunately, this is one huge list so getting this out as seperate objects is a little tricky. Something on the lines of:

    lapply(eapply(env,na.omit),function(x) assign(paste("t",substitute(x),sep=""),x,envir=.GlobalEnv))
    

    should work, but the substitute is not picking out the list element names properly.

    0 讨论(0)
  • 2020-12-28 18:12

    Besides @Hong Ooi answer I suggest looking into packages plyr and reshape. In your case following might be useful:

    df1$name <- "var1"
    df2$name <- "var2" 
    df3$name <- "var3"
    df <- rbind(df1,df2,df3)
    df <- na.omit(df)
    
    ##Get various means:
    > ddply(df,~name,summarise,AvgName=mean(value))
      name AvgName
      1 var1    31.0
      2 var2   126.8
      3 var3   300.0
    
    > ddply(df,~Region,summarise,AvgRegion=mean(value)) 
         Region AvgRegion
    1    Africa 190.00000
    2      Asia 130.00000
    3    Europe  86.66667
    4 N.America 213.33333
    5 S.America 143.00000
    
    
    > ddply(df,~variable,summarise,AvgVar=mean(value))
      variable AvgVar
    1     2004   31.0
    2     2005  126.8
    3     2006  300.0
    
    ##Transform the data.frame into another format   
    > cast(Region+variable~name,data=df)
          Region variable var1 var2 var3
    1     Africa     2004   20   NA   NA
    2     Africa     2005   NA  350   NA
    3     Africa     2006   NA   NA  200
    4       Asia     2004   35   NA   NA
    5       Asia     2005   NA   55   NA
    6       Asia     2006   NA   NA  300
    7     Europe     2004   20   NA   NA
    8     Europe     2005   NA   40   NA
    9     Europe     2006   NA   NA  200
    10 N.America     2004   50   NA   NA
    11 N.America     2005   NA   90   NA
    12 N.America     2006   NA   NA  500
    13 S.America     2004   30   NA   NA
    14 S.America     2005   NA   99   NA
    15 S.America     2006   NA   NA  300
    
    0 讨论(0)
  • 2020-12-28 18:14

    As a general guideline, if you have several objects that you want to apply the same operations to, you should collect them into one data structure. Then you can use loops, [sl]apply, etc to do the operations in one go. In this case, instead of having separate data frames df1, df2, etc, you could put them into a list of data frames and then run na.omit on all of them:

    dflist <- list(df1, df2, <...>)
    dflist <- lapply(dflist, na.omit)
    
    0 讨论(0)
提交回复
热议问题