plyr | 易学教程

Group (factorial) data with multiple factors. error: incompatible size (0), expecting 1 (the group size) or 1

阅读更多关于 Group (factorial) data with multiple factors. error: incompatible size (0), expecting 1 (the group size) or 1

问题 This post is a following up of Changing line color in ggplot based on "several factors" slope I would like to group the data (bellow) by "PQ", however I get the following error: "incompatible size (0), expecting 1 (the group size) or 1" Data ID<-c("A_P1","A_P1","A_P1","A_P1","A_P1","A_P2","A_P2","A_P2","A_P2","A_P2","A_P2","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2") Q<-c("C1","C1","C2","C3","C3","C1","C1","C2","C2","C3","C3"

Rescaling with plyr (ddply) in R

阅读更多关于 Rescaling with plyr (ddply) in R

问题 I've got this csv table for which I need to rescale data between 0 and 1 per each column. That is, the lowest value of any given column will be 0, the highest will be 1, and all other values will be linearly scaled accordingly. Here's my script: tableau <- read.csv("/tableau.csv") tableau.m <- melt(tableau) tableau.m <- ddply(tableau.m, .(variable), transform,rescale = rescale(value)) (And here's the data: https://dl.dropboxusercontent.com/u/73950/tableau.csv) The issue is that I need the

Allow a maximum number of entries when certain conditions apply

阅读更多关于 Allow a maximum number of entries when certain conditions apply

问题 I have a dataset with a lot of entries. Each of these entries belongs to a certain ID (belongID), the entries are unique (with uniqID), but multiple entries can come from the same source (sourceID). It is also possible that multiple entries from the same source have a the same belongID. For the purposes of the research I need to do on the dataset I have to get rid of the entries of a single sourceID that occur more than 5 times for 1 belongID. The maximum of 5 entries that need to be kept are

Sampling small data frame from a big dataframe

阅读更多关于 Sampling small data frame from a big dataframe

问题 I am trying to sample a data frame from a given data frame such that there are enough samples from each of the levels of a variable. This can be achieved by separating the data frame by the levels and sample from each of those . I thought ddply (data-frame to data-frame) would do it for me. Taking a minimal example: set.seed(1) data1 <-data.frame(a=sample(c('B0','B1','B2'),100,replace=TRUE),b=rnorm(100),c=runif(100)) > summary(data1$a) B0 B1 B2 30 32 38 The following commands perform the

Tag all duplicate rows in R as in Stata

阅读更多关于 Tag all duplicate rows in R as in Stata

问题 Following up from my question here, I am trying to replicate in R the functionality of the Stata command duplicates tag , which allows me to tag all the rows of a dataset that are duplicates in terms of a given set of variables: clear * set obs 16 g f1 = _n expand 104 bys f1: g f2 = _n expand 2 bys f1 f2: g f3 = _n expand 41 bys f1 f2 f3: g f4 = _n des // describe the dataset in memory preserve sample 10 // draw a 10% random sample tempfile sampledata save `sampledata', replace restore //

Tag all duplicate rows in R as in Stata

阅读更多关于 Tag all duplicate rows in R as in Stata

Combine a list of data frames into one preserving row names

阅读更多关于 Combine a list of data frames into one preserving row names

问题 I do know about the basics of combining a list of data frames into one as has been answered before. However, I am interested in smart ways to maintain row names. Suppose I have a list of data frames that are fairly equal and I keep them in a named list. library(plyr) library(dplyr) library(data.table) a = data.frame(x=1:3, row.names = letters[1:3]) b = data.frame(x=4:6, row.names = letters[4:6]) c = data.frame(x=7:9, row.names = letters[7:9]) l = list(A=a, B=b, C=c) When I use do.call , the

How to use string variables to create variables list for ddply?

阅读更多关于 How to use string variables to create variables list for ddply?

问题 Using R's builtin ToothGrowth example dataset, this works: ddply(ToothGrowth, .(supp,dose), function(df) mean(df$len)) But I would like to have the subsetting factors be variables, something like factor1 = 'supp' factor2 = 'dose' ddply(ToothGrowth, .(factor1,factor2), function(df) mean(df$len)) That doesn't work. How should this be done? I thought perhaps something like this: factorCombo = paste('.(',factor1,',',factor2,')', sep='') ddply(ToothGrowth, factorCombo, function(df) mean(df$len))

ddply with fixed number of rows

阅读更多关于 ddply with fixed number of rows

问题 I want to break up my data by 'number of rows'. That is to say I want to send a fixed number of rows to my function and when I get to the end of the data frame (last chunk) I need to just send the chunk whether it has the fixed number of rows or less. Something like this: ddply(df, .(8 rows), .fun=somefunction) 回答1: If you want to use plyr you can add a category column: df <- data.frame(x=rnorm(100), y=rnorm(100)) somefunction <- function(df) { data.frame(mean(df$x), mean(df$y)) } df$category

ddply with fixed number of rows

阅读更多关于 ddply with fixed number of rows