问题
I have a data frame where the column names are generated based on parameters - so I don't know their exact values. I want to pass these fields to ddply also as parameters. I guess the answer is obvious, but can someone please turn the light on for me.
Example below using the iris data set that gives the idea of what I want to do, and the unintended result of my effort. The results of first example, iris1 is what I want to achieve, but by passing the column names in as parameters, as in my iris2 effort, that doesn't give me the intended results.
iris1 <- ddply(iris, .(Species), transform, pw_first = Petal.Width[1],
pw_last = Petal.Width[length(Petal.Width)])
myCol <- 'Petal.Width'
iris2 <- ddply(iris, .(Species), transform, pw_first = myCol[1],
pw_last = myCol[length(myCol)])
head(iris1)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species pw_first pw_last
# 1 5.1 3.5 1.4 0.2 setosa 0.2 0.2
# 2 4.9 3.0 1.4 0.2 setosa 0.2 0.2
# 3 4.7 3.2 1.3 0.2 setosa 0.2 0.2
# 4 4.6 3.1 1.5 0.2 setosa 0.2 0.2
# 5 5.0 3.6 1.4 0.2 setosa 0.2 0.2
# 6 5.4 3.9 1.7 0.4 setosa 0.2 0.2
head(iris2)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species pw_first pw_last
# 1 5.1 3.5 1.4 0.2 setosa Petal.Width Petal.Width
# 2 4.9 3.0 1.4 0.2 setosa Petal.Width Petal.Width
# 3 4.7 3.2 1.3 0.2 setosa Petal.Width Petal.Width
# 4 4.6 3.1 1.5 0.2 setosa Petal.Width Petal.Width
# 5 5.0 3.6 1.4 0.2 setosa Petal.Width Petal.Width
# 6 5.4 3.9 1.7 0.4 setosa Petal.Width Petal.Width
回答1:
Here you go. The idea in this solution is to use get
, which looks for a variable inside the current environment. So get(myCol)
will find myCol
in the data frame being operated upon.
myCol <- 'Petal.Width'
iris2 <- ddply(iris, .(Species), transform,
pw_first = get(myCol)[1],
pw_last = get(myCol)[length(get(myCol))]
)
Another approach, which might be simpler to understand
iris2 <- ddply(iris, .(Species), function(df){
x = df[[myCol]]
transform(df, pw_first = x[1], pw_last = x[length(x)])
})
回答2:
colName<-"Petal.Width"
iris1 <- ddply(iris, .(Species), function (x) {
pw.first=x[1,colName]
pw.last=x[length(x[,1]),colName]
result=cbind(x,pw.first,pw.last)
return(result)})
unique(iris1$pw.first)
[1] 0.2 1.4 2.5
unique(iris1$pw.last)
[1] 0.2 1.3 1.8
If you only want the species, and pw.first and pw.last, simple remove the x from cbind.
回答3:
Still learning R, but I find the Function interface for ddply to fit my brain... Maybe this is close?
iris1 <- ddply(iris,
.(Species),
function(x,y) {result = data.frame(x$Petal.Width[1],
x$Petal.Width[length(x$Petal.Width)])
names(result) <- y
return(result)},
c('first','last'))
iris1
Result:
Species first last
1 setosa 0.2 0.2
2 versicolor 1.4 1.3
3 virginica 2.5 1.8
Or perhaps this?
iris1 <- ddply(iris,
.(Species),
function(x,y) {
result = cbind(x,x$Petal.Width[1],x$Petal.Width[length(x$Petal.Width)])
names(result) = c(names(x),y)
return(result)
},
c('first','last'))
head(iris1)
result:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species first last
1 5.1 3.5 1.4 0.2 setosa 0.2 0.2
2 4.9 3.0 1.4 0.2 setosa 0.2 0.2
3 4.7 3.2 1.3 0.2 setosa 0.2 0.2
4 4.6 3.1 1.5 0.2 setosa 0.2 0.2
5 5.0 3.6 1.4 0.2 setosa 0.2 0.2
6 5.4 3.9 1.7 0.4 setosa 0.2 0.2
Ok, makes more sense now. Passing an existing column of the data.frame as a parameter which then produces two added columns to the data.frame using the parameter column as the source of a calculation. How about this:
iris1 <- ddply(iris,
.(Species),
function(x,y) {
len <- length(x[,1])
first <- x[1,y]
last <- x[len,y]
result <- cbind(x,first,last)
names(result) <- c(names(x),'first','last')
return(result)
},
'Petal.Width'
)
head(iris1)
Result:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species first last
1 5.1 3.5 1.4 0.2 setosa 0.2 0.2
2 4.9 3.0 1.4 0.2 setosa 0.2 0.2
3 4.7 3.2 1.3 0.2 setosa 0.2 0.2
4 4.6 3.1 1.5 0.2 setosa 0.2 0.2
5 5.0 3.6 1.4 0.2 setosa 0.2 0.2
6 5.4 3.9 1.7 0.4 setosa 0.2 0.2
I hope you're going to do something other than 'first' and 'last' -- like a mean
or sd
function. first and last are dependent on the ddply
function giving the anonymous function data in a known order ... I'm not sure if it does or not. You might get different, unexpected answers.
来源:https://stackoverflow.com/questions/22266468/ddply-how-do-i-pass-column-names-as-parameters