plyr | 易学教程

Error bars on stacked bar ggplot2

阅读更多关于 Error bars on stacked bar ggplot2

问题 I'm struggling to put error bars into the correct place on a stacked bar. As I read on an earlier post I used ddply in order to stack the error bars. Then that changed the order of the stacking so I ordered the factor. Now it appears the error bars are correct on one set of bars but not the other. What I want is a graph that looks like that below, just with the standard error shown with error bars. I'm listing the dput of the original data and the ddply data as well as the data set. Suz2$org

Adding a row for the ratio of two variables

阅读更多关于 Adding a row for the ratio of two variables

问题 For each DVID and FORM I want to add a row for the Fed/fasted ratio into my data frame dfin >- DVID FORM FED median gmean CV 1 A fast 15 20 10 1 A Fed 30 40 15 1 B fast 40 60 20 1 B Fed 50 100 25 mydfout <- DVID FORM FED median gmean CV 1 A fast 15 20 10 1 A Fed 30 40 15 1 A Fed/Fasted(%) 200 200 NA 1 B fast 40 60 20 1 B Fed 50 100 25 1 B Fed/Fasted(%) 125 166.6 NA how can I do this in R? 回答1: we can use base r functions to perform this: A=aggregate(cbind(median,gmean)~DVID+FORM,dat1,function

Remove NA columns in a list of dataframes

阅读更多关于 Remove NA columns in a list of dataframes

问题 I am having some trouble cleaning data that I imported from Excel with readxl . readxl created a large list of objects with classes = c('data.frame', tbl_df, tbl) (I would also like to know about why/how it has multiple classes assigned to it). Each of those objects is one of the sheets in the original Excel workbook. The problem is that each of those objects (sheets) may have many columns entirely filled with NAs. I have scanned through stackoverflow and found some similar problems and tried

How to calculate time difference between datetimes, for each group (student-contract)?

阅读更多关于 How to calculate time difference between datetimes, for each group (student-contract)?

问题 I have a specific problem; I have data in the following format: # USER_ID SUBMISSION_DATE CONTRACT_REF 1 1 20/6 1:00 W001 2 1 20/6 2:00 W002 3 1 20/6 3:30 W003 4 4 20/6 4:00 W004 5 5 20/6 5:00 W005 6 5 20/6 6:00 W006 7 7 20/6 7:00 W007 8 7 20/6 8:00 W008 9 7 20/6 9:00 W009 10 7 20/6 10:00 W0010 Now I need to somehow calculate the time difference between the different submissions (uniquely identifiable). In other words: I have a table of submissions , in this table, there are all submissions

R: ddply repeats yearly cumulative data

阅读更多关于 R: ddply repeats yearly cumulative data

问题 Related to this question here, but I decided to ask another question for the sake of clarity as the 'new' question is not directly related to the original. Briefly, I am using ddply to cumulatively sum a value for each of three years. My code takes data from the first year and repeats in in the second and third-year rows of the column. My guess is that each 1-year chunk is being copied to the whole of the column, but I don't understand why. Q. How can I get a cumulatively summed value for

Transpose duplicated rows to column in R

阅读更多关于 Transpose duplicated rows to column in R

问题 I have a large data.frame (20000+ entries) in this format: id D1 D2 1 0.40 0.21 1 0.00 0.00 1 0.53 0.20 2 0.17 0.17 2 0.25 0.25 2 0.55 0.43 Where each id may be duplicated 3-20 times. I would like to merge the duplicated rows into new columns, so my new data.frame looks like: id D1 D2 D3 D4 D5 D6 1 0.40 0.21 0.00 0.00 0.53 0.20 2 0.17 0.17 0.25 0.25 0.55 0.43 I've manipulated data.frames before with plyr, but I'm not sure how to approach this problem. Any help would be appreciated.Thanks. 回答1

Parallel *ply within functions

阅读更多关于 Parallel *ply within functions

问题 I want to use the parallel functionality of the plyr package within functions. I would have thought that the proper way to export objects that have been created within the body of the function (in this example, the object is df_2 ) is as follows # rm(list=ls()) library(plyr) library(doParallel) workers=makeCluster(2) registerDoParallel(workers,core=2) plyr_test=function() { df_1=data.frame(type=c("a","b"),x=1:2) df_2=data.frame(type=c("a","b"),x=3:4) #export df_2 via .paropts ddply(df_1,"type

Equivalent to ddply(…,transform,…) in data.table

阅读更多关于 Equivalent to ddply(…,transform,…) in data.table

问题 I have the following code using ddply from plyr package: ddply(mtcars,.(cyl),transform,freq=length(cyl)) The data.table version of this is : DT<-data.table(mtcars) DT[,freq:=.N,by=cyl] How can I extend this when I have more than one function like the one below? Now, I want to perform more than one function on ddply and data.table : ddply(mtcars,.(cyl),transform,freq=length(cyl),sum=sum(mpg)) DT[,list(freq=.N,sum=sum(mpg)),by=cyl] But, data.table gives me only three columns cyl,freq, and sum.

Tabulate responses for multiple columns by grouping variable with dplyr

阅读更多关于 Tabulate responses for multiple columns by grouping variable with dplyr

问题 Hi:I'm new to the plyr/dplyr family but enjoying it. I can see it's massive utility for my own work, but I'm stil trying to get my head around it. I have a data frame that looks like below. 1) How do I produce a table for each non-grouping variable that shows the distribution of responses within each value of the grouping variable? 2) Note: I do have some missing values and I would like to exclude them from the tabulation. I realize the summarize_each command will apply the function to each

converting summary created using 'by' to data.frame

阅读更多关于 converting summary created using 'by' to data.frame

问题 df1=data.frame(c(2,1,2),c(1,2,3,4,5,6),seq(141,170)) #create data.frame names(df1) = c("gender","age","height") #column names df1$gender <- factor(df1$gender, levels=c(1,2), labels=c("female","male")) #gives levels and labels to gender df1$age <- factor(df1$age, levels=c(1,2,3,4,5,6), labels=c("16-24","25-34","35-44","45-54","55-64","65+")) # gives levels and labels to age groups I am looking to produce a summary of the height values subsetted by gender and then age. Using the subset and by