plyr | 易学教程

Can't install package reshape2 for R 2.15.3 on Ubuntu 12.04.4

阅读更多关于 Can't install package reshape2 for R 2.15.3 on Ubuntu 12.04.4

问题 I am having trouble installing the reshape2 package for R 2.15.3 on Ubuntu 12.04.4 LTS. I decided not to upgrade to R 3.x because many of the packages that I use have not been upgraded to support the new version. When I try to install reshape2 using R 2.15.3 on Ubuntu 12.04.4 LTS, I get the following. > install.packages("reshape2") Installing package(s) into â/usr/local/lib/R/site-libraryâ (as âlibâ is unspecified) Warning in install.packages("reshape2") : 'lib = "/usr/local/lib/R/site

Operations on mult-dimensional arrays in R: apply vs data.table vs plyr (parallel)

阅读更多关于 Operations on mult-dimensional arrays in R: apply vs data.table vs plyr (parallel)

问题 In my research work, I normally deal with big 4D arrays (20-200 millions of elements). I'm trying to improve the computational speed of my calculations looking for an optimal trade-off between speed and simplicity. I've already did some step forward thanks to SO (see here and here) Now, I'm trying to exploit the latest packages like data.table and plyr . Let's start with something like: D = c(100, 1000, 8) #x,y,t d = array(rnorm(prod(D)), dim = D) I'd like to get for each x (first dimension)

using multiple variables in plyr

阅读更多关于 using multiple variables in plyr

问题 I am trying to use plyr but have difficulties in using several variables. Here an example. df <- read.table(header=TRUE, text=" Firm Foreign SME Turnover A1 N Y 200 A2 N N 1000 A3 Y Y 100 A1 N N 500 A2 Y Y 200 A3 Y Y 1000 A1 Y N 200 A2 N N 1000 A2 N Y 100 A2 N Y 200 ") I am trying to create a table which summarize the Turnover on the two variables. Basically combining the following codes t1 <- ddply(df, c('Firm', 'Foreign'), summarise, BudgetForeign = sum(Turnover, na.rm = TRUE)) t2 <- ddply

using multiple variables in plyr

阅读更多关于 using multiple variables in plyr

R data.frame: rowSums of selected columns by grouping vector

阅读更多关于 R data.frame: rowSums of selected columns by grouping vector

问题 I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). Example: sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D")) > sample cha1 num1 num2 num3 num4 cha2 1 A 1 3

find 75 percentile and replacing by median for each group in R

阅读更多关于 find 75 percentile and replacing by median for each group in R

问题 These problem similar with this my own topic calculation of 90 percentile and replacement of it by median by groups in R With this distinction that. But, in that topic Note the calculation is done by 14 zeros preceding the one category of action but replacing by median is done for all zero category of action and performing for each groups code+item namely ,now i use all zeros and not 14 preceding and don't touch negative and zero values of return. By group variable (action- 0, 1) for Zero

Adding selected data frames together, from a list of data frames

阅读更多关于 Adding selected data frames together, from a list of data frames

问题 I encountered big problem when trying to apply my micro solution to macro scale. I want to write a function that will allow me to automatize adding all values of specific data frames together. First, I have created list of all data frames: > lst $data001 A B C D E X 10 30 50 70 Y 20 40 60 80 $data002 A B C D E X 10 30 50 70 Y 20 40 60 80 $data003 A B C D E X 10 30 50 70 Y 20 40 60 80 Z 20 40 60 80 $data004 A B C D E X 10 30 50 70 Y 20 40 60 80 Z 20 40 60 80 V 20 40 60 80 $data005 A B C D E Q

Aggregate sum and mean in R with ddply

阅读更多关于 Aggregate sum and mean in R with ddply

问题 My data frame has two columns that are used as a grouping key, 17 columns that need to be summed in each group, and one column that should be averaged instead. Let me illustrate this on a different data frame, diamonds from ggplot2 . I know I could do it like this: ddply(diamonds, ~cut, summarise, x=sum(x), y=sum(y), z=sum(z), price=mean(price)) But while it is reasonable for 3 columns, it is unacceptable for 17 of them. When researching this, I found the colwise function, but the best I came

Strange environment behavior in parallel plyr

阅读更多关于 Strange environment behavior in parallel plyr

问题 Recently, I have created an object factor=1 in my workspace, not knowing that there is a function factor in the base package. What I intended to do was to use the variable factor within a parallel loop, e.g., library(plyr) library(foreach) library(doParallel) workers <- makeCluster(2) registerDoParallel(workers,cores=2) factor=1 llply( as.list(1:2), function(x) factor*x, .parallel = TRUE, .paropts=list(.export=c("factor")) ) This, however, results in an error that took me so time to

Strange environment behavior in parallel plyr

阅读更多关于 Strange environment behavior in parallel plyr