plyr

Group (factorial) data with multiple factors. error: incompatible size (0), expecting 1 (the group size) or 1

隐身守侯 提交于 2019-12-20 05:16:27
问题 This post is a following up of Changing line color in ggplot based on "several factors" slope I would like to group the data (bellow) by "PQ", however I get the following error: "incompatible size (0), expecting 1 (the group size) or 1" Data ID<-c("A_P1","A_P1","A_P1","A_P1","A_P1","A_P2","A_P2","A_P2","A_P2","A_P2","A_P2","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2") Q<-c("C1","C1","C2","C3","C3","C1","C1","C2","C2","C3","C3"

Rescaling with plyr (ddply) in R

≡放荡痞女 提交于 2019-12-20 04:22:28
问题 I've got this csv table for which I need to rescale data between 0 and 1 per each column. That is, the lowest value of any given column will be 0, the highest will be 1, and all other values will be linearly scaled accordingly. Here's my script: tableau <- read.csv("/tableau.csv") tableau.m <- melt(tableau) tableau.m <- ddply(tableau.m, .(variable), transform,rescale = rescale(value)) (And here's the data: https://dl.dropboxusercontent.com/u/73950/tableau.csv) The issue is that I need the

Allow a maximum number of entries when certain conditions apply

喜欢而已 提交于 2019-12-20 03:09:20
问题 I have a dataset with a lot of entries. Each of these entries belongs to a certain ID (belongID), the entries are unique (with uniqID), but multiple entries can come from the same source (sourceID). It is also possible that multiple entries from the same source have a the same belongID. For the purposes of the research I need to do on the dataset I have to get rid of the entries of a single sourceID that occur more than 5 times for 1 belongID. The maximum of 5 entries that need to be kept are

Sampling small data frame from a big dataframe

和自甴很熟 提交于 2019-12-20 02:08:50
问题 I am trying to sample a data frame from a given data frame such that there are enough samples from each of the levels of a variable. This can be achieved by separating the data frame by the levels and sample from each of those . I thought ddply (data-frame to data-frame) would do it for me. Taking a minimal example: set.seed(1) data1 <-data.frame(a=sample(c('B0','B1','B2'),100,replace=TRUE),b=rnorm(100),c=runif(100)) > summary(data1$a) B0 B1 B2 30 32 38 The following commands perform the

Tag all duplicate rows in R as in Stata

三世轮回 提交于 2019-12-20 01:39:09
问题 Following up from my question here, I am trying to replicate in R the functionality of the Stata command duplicates tag , which allows me to tag all the rows of a dataset that are duplicates in terms of a given set of variables: clear * set obs 16 g f1 = _n expand 104 bys f1: g f2 = _n expand 2 bys f1 f2: g f3 = _n expand 41 bys f1 f2 f3: g f4 = _n des // describe the dataset in memory preserve sample 10 // draw a 10% random sample tempfile sampledata save `sampledata', replace restore //

Tag all duplicate rows in R as in Stata

时光毁灭记忆、已成空白 提交于 2019-12-20 01:39:07
问题 Following up from my question here, I am trying to replicate in R the functionality of the Stata command duplicates tag , which allows me to tag all the rows of a dataset that are duplicates in terms of a given set of variables: clear * set obs 16 g f1 = _n expand 104 bys f1: g f2 = _n expand 2 bys f1 f2: g f3 = _n expand 41 bys f1 f2 f3: g f4 = _n des // describe the dataset in memory preserve sample 10 // draw a 10% random sample tempfile sampledata save `sampledata', replace restore //

Combine a list of data frames into one preserving row names

孤街醉人 提交于 2019-12-19 15:40:13
问题 I do know about the basics of combining a list of data frames into one as has been answered before. However, I am interested in smart ways to maintain row names. Suppose I have a list of data frames that are fairly equal and I keep them in a named list. library(plyr) library(dplyr) library(data.table) a = data.frame(x=1:3, row.names = letters[1:3]) b = data.frame(x=4:6, row.names = letters[4:6]) c = data.frame(x=7:9, row.names = letters[7:9]) l = list(A=a, B=b, C=c) When I use do.call , the

How to use string variables to create variables list for ddply?

可紊 提交于 2019-12-19 14:05:15
问题 Using R's builtin ToothGrowth example dataset, this works: ddply(ToothGrowth, .(supp,dose), function(df) mean(df$len)) But I would like to have the subsetting factors be variables, something like factor1 = 'supp' factor2 = 'dose' ddply(ToothGrowth, .(factor1,factor2), function(df) mean(df$len)) That doesn't work. How should this be done? I thought perhaps something like this: factorCombo = paste('.(',factor1,',',factor2,')', sep='') ddply(ToothGrowth, factorCombo, function(df) mean(df$len))

ddply with fixed number of rows

邮差的信 提交于 2019-12-19 11:06:39
问题 I want to break up my data by 'number of rows'. That is to say I want to send a fixed number of rows to my function and when I get to the end of the data frame (last chunk) I need to just send the chunk whether it has the fixed number of rows or less. Something like this: ddply(df, .(8 rows), .fun=somefunction) 回答1: If you want to use plyr you can add a category column: df <- data.frame(x=rnorm(100), y=rnorm(100)) somefunction <- function(df) { data.frame(mean(df$x), mean(df$y)) } df$category

ddply with fixed number of rows

房东的猫 提交于 2019-12-19 11:06:15
问题 I want to break up my data by 'number of rows'. That is to say I want to send a fixed number of rows to my function and when I get to the end of the data frame (last chunk) I need to just send the chunk whether it has the fixed number of rows or less. Something like this: ddply(df, .(8 rows), .fun=somefunction) 回答1: If you want to use plyr you can add a category column: df <- data.frame(x=rnorm(100), y=rnorm(100)) somefunction <- function(df) { data.frame(mean(df$x), mean(df$y)) } df$category