Dealing with repetitive tasks in R

后端未结

关注

 3  740

I often find myself having to perform repetitive tasks in R. It gets extremely frustrating having to constantly run the same function on one or more data structures over and

相关标签:

3条回答

没有蜡笔的小新

2020-12-28 18:11

If the names are similar you could iterate over them using the pattern argument to ls:

for (i in ls(pattern="df")){
  assign(paste("t",i,sep=""),na.omit(get(i)))
}

However, a more "R" way of doing it seems to be to use separate environment and eapply:

# setup environment
env <- new.env()

# copy dataframes across (using common pattern)
for (i in ls(pattern="df")){
  asssign(i,get(i),envir=env)
  }

# apply function on environment
eapply(env,na.omit)

Which yields:

$df3
     Region variable value
1      Asia     2006   300
2    Africa     2006   200
3    Europe     2006   200
4 N.America     2006   500
5 S.America     2006   300

$df2
     Region variable value
1      Asia     2005    55
2    Africa     2005   350
3    Europe     2005    40
4 N.America     2005    90
5 S.America     2005    99

$df1
     Region variable value
1      Asia     2004    35
2    Africa     2004    20
3    Europe     2004    20
4 N.America     2004    50
5 S.America     2004    30

Unfortunately, this is one huge list so getting this out as seperate objects is a little tricky. Something on the lines of:

lapply(eapply(env,na.omit),function(x) assign(paste("t",substitute(x),sep=""),x,envir=.GlobalEnv))

should work, but the substitute is not picking out the list element names properly.

0 讨论(0)

广开言路

2020-12-28 18:12

Besides @Hong Ooi answer I suggest looking into packages plyr and reshape. In your case following might be useful:

df1$name <- "var1"
df2$name <- "var2" 
df3$name <- "var3"
df <- rbind(df1,df2,df3)
df <- na.omit(df)

##Get various means:
> ddply(df,~name,summarise,AvgName=mean(value))
  name AvgName
  1 var1    31.0
  2 var2   126.8
  3 var3   300.0

> ddply(df,~Region,summarise,AvgRegion=mean(value)) 
     Region AvgRegion
1    Africa 190.00000
2      Asia 130.00000
3    Europe  86.66667
4 N.America 213.33333
5 S.America 143.00000


> ddply(df,~variable,summarise,AvgVar=mean(value))
  variable AvgVar
1     2004   31.0
2     2005  126.8
3     2006  300.0

##Transform the data.frame into another format   
> cast(Region+variable~name,data=df)
      Region variable var1 var2 var3
1     Africa     2004   20   NA   NA
2     Africa     2005   NA  350   NA
3     Africa     2006   NA   NA  200
4       Asia     2004   35   NA   NA
5       Asia     2005   NA   55   NA
6       Asia     2006   NA   NA  300
7     Europe     2004   20   NA   NA
8     Europe     2005   NA   40   NA
9     Europe     2006   NA   NA  200
10 N.America     2004   50   NA   NA
11 N.America     2005   NA   90   NA
12 N.America     2006   NA   NA  500
13 S.America     2004   30   NA   NA
14 S.America     2005   NA   99   NA
15 S.America     2006   NA   NA  300

0 讨论(0)

南笙

2020-12-28 18:14
As a general guideline, if you have several objects that you want to apply the same operations to, you should collect them into one data structure. Then you can use loops, [sl]apply, etc to do the operations in one go. In this case, instead of having separate data frames df1, df2, etc, you could put them into a list of data frames and then run na.omit on all of them:
```
dflist <- list(df1, df2, <...>)
dflist <- lapply(dflist, na.omit)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...