ddply: how do I pass column names as parameters?

ⅰ亾dé卋堺 提交于 2019-12-04 07:00:59

问题


I have a data frame where the column names are generated based on parameters - so I don't know their exact values. I want to pass these fields to ddply also as parameters. I guess the answer is obvious, but can someone please turn the light on for me.

Example below using the iris data set that gives the idea of what I want to do, and the unintended result of my effort. The results of first example, iris1 is what I want to achieve, but by passing the column names in as parameters, as in my iris2 effort, that doesn't give me the intended results.

iris1 <- ddply(iris, .(Species), transform, pw_first = Petal.Width[1], 
              pw_last = Petal.Width[length(Petal.Width)])
myCol <- 'Petal.Width'
iris2 <- ddply(iris, .(Species), transform, pw_first = myCol[1], 
               pw_last = myCol[length(myCol)])

head(iris1)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species pw_first pw_last
# 1          5.1         3.5          1.4         0.2  setosa      0.2     0.2
# 2          4.9         3.0          1.4         0.2  setosa      0.2     0.2
# 3          4.7         3.2          1.3         0.2  setosa      0.2     0.2
# 4          4.6         3.1          1.5         0.2  setosa      0.2     0.2
# 5          5.0         3.6          1.4         0.2  setosa      0.2     0.2
# 6          5.4         3.9          1.7         0.4  setosa      0.2     0.2

head(iris2)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    pw_first     pw_last
# 1          5.1         3.5          1.4         0.2  setosa Petal.Width Petal.Width
# 2          4.9         3.0          1.4         0.2  setosa Petal.Width Petal.Width
# 3          4.7         3.2          1.3         0.2  setosa Petal.Width Petal.Width
# 4          4.6         3.1          1.5         0.2  setosa Petal.Width Petal.Width
# 5          5.0         3.6          1.4         0.2  setosa Petal.Width Petal.Width
# 6          5.4         3.9          1.7         0.4  setosa Petal.Width Petal.Width

回答1:


Here you go. The idea in this solution is to use get, which looks for a variable inside the current environment. So get(myCol) will find myCol in the data frame being operated upon.

myCol <- 'Petal.Width'
iris2 <- ddply(iris, .(Species), transform, 
  pw_first = get(myCol)[1],
  pw_last = get(myCol)[length(get(myCol))]
)

Another approach, which might be simpler to understand

iris2 <- ddply(iris, .(Species), function(df){
  x = df[[myCol]]
  transform(df, pw_first = x[1], pw_last = x[length(x)])
})



回答2:


colName<-"Petal.Width"

iris1 <- ddply(iris, .(Species), function (x) {
               pw.first=x[1,colName]
               pw.last=x[length(x[,1]),colName]
               result=cbind(x,pw.first,pw.last)
               return(result)})

unique(iris1$pw.first)
[1] 0.2 1.4 2.5

unique(iris1$pw.last)
[1] 0.2 1.3 1.8

If you only want the species, and pw.first and pw.last, simple remove the x from cbind.




回答3:


Still learning R, but I find the Function interface for ddply to fit my brain... Maybe this is close?

iris1 <- ddply(iris, 
               .(Species), 
               function(x,y) {result = data.frame(x$Petal.Width[1],
                                                  x$Petal.Width[length(x$Petal.Width)])
                              names(result) <- y
                              return(result)},
               c('first','last'))
iris1

Result:

     Species first last
1     setosa   0.2  0.2
2 versicolor   1.4  1.3
3  virginica   2.5  1.8

Or perhaps this?

iris1 <- ddply(iris, 
               .(Species), 
               function(x,y) {
                 result = cbind(x,x$Petal.Width[1],x$Petal.Width[length(x$Petal.Width)])
                 names(result) = c(names(x),y)
                 return(result)
                 },
               c('first','last'))
head(iris1)

result:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species first last
1          5.1         3.5          1.4         0.2  setosa   0.2  0.2
2          4.9         3.0          1.4         0.2  setosa   0.2  0.2
3          4.7         3.2          1.3         0.2  setosa   0.2  0.2
4          4.6         3.1          1.5         0.2  setosa   0.2  0.2
5          5.0         3.6          1.4         0.2  setosa   0.2  0.2
6          5.4         3.9          1.7         0.4  setosa   0.2  0.2

Ok, makes more sense now. Passing an existing column of the data.frame as a parameter which then produces two added columns to the data.frame using the parameter column as the source of a calculation. How about this:

iris1 <- ddply(iris, 
               .(Species), 
               function(x,y) {
                 len <- length(x[,1])
                 first <- x[1,y]
                 last <- x[len,y]
                 result <- cbind(x,first,last)
                 names(result) <- c(names(x),'first','last')
                 return(result)
               },
               'Petal.Width'
)
head(iris1)

Result:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species first last
1          5.1         3.5          1.4         0.2  setosa   0.2  0.2
2          4.9         3.0          1.4         0.2  setosa   0.2  0.2
3          4.7         3.2          1.3         0.2  setosa   0.2  0.2
4          4.6         3.1          1.5         0.2  setosa   0.2  0.2
5          5.0         3.6          1.4         0.2  setosa   0.2  0.2
6          5.4         3.9          1.7         0.4  setosa   0.2  0.2

I hope you're going to do something other than 'first' and 'last' -- like a mean or sd function. first and last are dependent on the ddply function giving the anonymous function data in a known order ... I'm not sure if it does or not. You might get different, unexpected answers.



来源:https://stackoverflow.com/questions/22266468/ddply-how-do-i-pass-column-names-as-parameters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!