R self reference

て烟熏妆下的殇ゞ 提交于 2019-11-26 08:08:40

问题


In R I find myself doing something like this a lot:

adataframe[adataframe$col==something]<-adataframe[adataframe$col==something)]+1

This way is kind of long and tedious. Is there some way for me
to reference the object I am trying to change such as

adataframe[adataframe$col==something]<-$self+1 

?


回答1:


Try package data.table and its := operator. It's very fast and very short.

DT[col1==something, col2:=col3+1]

The first part col1==something is the subset. You can put anything here and use the column names as if they are variables; i.e., no need to use $. Then the second part col2:=col3+1 assigns the RHS to the LHS within that subset, where the column names can be assigned to as if they are variables. := is assignment by reference. No copies of any object are taken, so is faster than <-, =, within and transform.

Also, soon to be implemented in v1.8.1, one end goal of j's syntax allowing := in j like that is combining it with by, see question: when should I use the := operator in data.table.

UDPDATE : That was indeed released (:= by group) in July 2012.




回答2:


You should be paying more attention to Gabor Grothendeick (and not just in this instance.) The cited inc function on Matt Asher's blog does all of what you are asking:

(And the obvious extension works as well.)

add <- function(x, inc=1) {
   eval.parent(substitute(x <- x + inc))
 }
# Testing the `inc` function behavior

EDIT: After my temporary annoyance at the lack of approval in the first comment, I took the challenge of adding yet a further function argument. Supplied with one argument of a portion of a dataframe, it would still increment the range of values by one. Up to this point has only been very lightly tested on infix dyadic operators, but I see no reason it wouldn't work with any function which accepts only two arguments:

transfn <- function(x, func="+", inc=1) {
   eval.parent(substitute(x <- do.call(func, list(x , inc)))) }

(Guilty admission: This somehow "feels wrong" from the traditional R perspective of returning values for assignment.) The earlier testing on the inc function is below:

df <- data.frame(a1 =1:10, a2=21:30, b=1:2)
 inc <- function(x) {
   eval.parent(substitute(x <- x + 1))
 }

#---- examples===============>

> inc(df$a1)  # works on whole columns
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   6 25 1
6   7 26 2
7   8 27 1
8   9 28 2
9  10 29 1
10 11 30 2
> inc(df$a1[df$a1>5]) # testing on a restricted range of one column
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   7 25 1
6   8 26 2
7   9 27 1
8  10 28 2
9  11 29 1
10 12 30 2

> inc(df[ df$a1>5, ])  #testing on a range of rows for all columns being transformed
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   8 26 2
6   9 27 3
7  10 28 2
8  11 29 3
9  12 30 2
10 13 31 3
# and even in selected rows and grepped names of columns meeting a criterion
> inc(df[ df$a1 <= 3, grep("a", names(df)) ])
> df
   a1 a2 b
1   3 22 1
2   4 23 2
3   4 23 1
4   5 24 2
5   8 26 2
6   9 27 3
7  10 28 2
8  11 29 3
9  12 30 2
10 13 31 3



回答3:


Here is what you can do. Let us say you have a dataframe

df = data.frame(x = 1:10, y = rnorm(10))

And you want to increment all the y by 1. You can do this easily by using transform

df = transform(df, y = y + 1)



回答4:


I'd be partial to (presumably the subset is on rows)

ridx <- adataframe$col==something
adataframe[ridx,] <- adataframe[ridx,] + 1

which doesn't rely on any fancy / fragile parsing, is reasonably expressive about the operation being performed, and is not too verbose. Also tends to break lines into nicely human-parse-able units, and there is something appealing about using standard idioms -- R's vocabulary and idiosyncrasies are already large enough for my taste.



来源:https://stackoverflow.com/questions/7768686/r-self-reference

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!