reshape2

Fast melted data.table operations

我与影子孤独终老i 提交于 2019-11-30 19:36:51
I am looking for patterns for manipulating data.table objects whose structure resembles that of dataframes created with melt from the reshape2 package. I am dealing with data tables with millions of rows. Performance is critical. The generalized form of the question is whether there is a way to perform grouping based on a subset of values in a column and have the result of the grouping operation create one or more new columns. A specific form of the question could be how to use data.table to accomplish the equivalent of what dcast does in the following: input <- data.table( id=c(1, 1, 1, 2, 2,

Error with custom aggregate function for a cast() call in R reshape2

泪湿孤枕 提交于 2019-11-30 19:32:49
I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package. # example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) # melt tab.melt <- melt(tab, id=1) # function to summarize with logic: mean if

dcast without ID variables

感情迁移 提交于 2019-11-30 09:05:34
问题 In the "An Introduction to reshape2" package Sean C. Anderson presents the following example. He uses the airquality data and renames the column names names(airquality) <- tolower(names(airquality)) The data look like # ozone solar.r wind temp month day # 1 41 190 7.4 67 5 1 # 2 36 118 8.0 72 5 2 # 3 12 149 12.6 74 5 3 # 4 18 313 11.5 62 5 4 # 5 NA NA 14.3 56 5 5 # 6 28 NA 14.9 66 5 6 Then he melts them by aql <- melt(airquality, id.vars = c("month", "day")) to get # month day variable value

reshape2: multiple results of aggregation function?

*爱你&永不变心* 提交于 2019-11-30 07:14:23
From what I read, *cast operations in reshape2 lost their result_variable feature. Hadley hints at using plyr for this purpose (appending multiple result columns to the input data frame). How would I realize the documentation example ... aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE) cast(aqm, month ~ variable + result_variable, range) using reshape2 ( dcast ) and plyr ( ddply )? This question has multiple answers, due to the flexibility of the 'reshape2' and 'plyr' packages. I will show one of the easiest examples to understand here: library(reshape2) library(plyr) aqm <- melt

How to “unmelt” data with reshape r

瘦欲@ 提交于 2019-11-30 06:39:51
I have a data frame that I melted using the reshape package that I would like to "un melt". here is a toy example of the melted data (real data frame is 500x100 or larger) : variable<-c(rep("X1",3),rep("X2",3),rep("X3",3)) value<-c(rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3)) dat <-data.frame(variable,value) dat variable value 1 X1 0.5285376 2 X1 0.5285376 3 X1 0.5285376 4 X2 0.1694908 5 X2 0.1694908 6 X2 0.1694908 7 X3 0.7446906 8 X3 0.7446906 9 X3 0.7446906 Each variable (X1, X2,X3) has values estimated at 3 different times (which in this toy example happen to be the

No non-missing arguments warning when using min or max in reshape2

旧城冷巷雨未停 提交于 2019-11-30 05:43:04
I get the following warning when I use min or max in the dcast function from the reshape2 package. What is it telling me? I can't find anything that explains the warning message and I'm a bit confused about why I get it when I use max but not when I use mean or other aggregate functions. Warning message: In .fun(.value[0], ...) : no non-missing arguments to min; returning Inf Here's a reproducible example: data(iris) library(reshape2) molten.iris <- melt(iris,id.var="Species") summary(molten.iris) str(molten.iris) #------------------------------------------------------------ # Both return

R: “Unary operator error” from multiline ggplot2 command

元气小坏坏 提交于 2019-11-30 05:39:44
I'm using ggplot2 to do a boxplot comparison of two different species, as indicated by the third column shown below: > library(reshape2) > library(ggplot2) > melt.data = melt(actb.raw.data) > head(actb.raw.data) region expression species 1 CG -0.17686667 human 2 CG -0.06506667 human 3 DG 1.04590000 human 4 CA1 1.94093333 human 5 CA2 1.55023333 human 6 CA3 1.75800000 human > head(melt.data) region species variable value 1 CG human expression -0.17686667 2 CG human expression -0.06506667 3 DG human expression 1.04590000 4 CA1 human expression 1.94093333 5 CA2 human expression 1.55023333 6 CA3

melt a data.table with a column pattern

大兔子大兔子 提交于 2019-11-30 05:24:37
问题 I have a data.table that looks like this: id A1g_hi A2g_hi A3g_hi A4g_hi 1 2 3 4 5 ... I would like to melt this table so that it looks like this: id time hi 1 1 2 1 2 3 1 3 4 1 4 5 ... I attempted something like this: melt(dtb, measure.vars = patterns("^A"), value.name = "hi", variable.name="time") which does not give me what I would like. Do I need to resort to string splitting here or are there native data.table functions that do this? 回答1: I raise my glass to @rawr who apparently

Error with custom aggregate function for a cast() call in R reshape2

六月ゝ 毕业季﹏ 提交于 2019-11-30 03:55:15
问题 I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package. # example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3),

Fast melted data.table operations

微笑、不失礼 提交于 2019-11-30 03:32:41
问题 I am looking for patterns for manipulating data.table objects whose structure resembles that of dataframes created with melt from the reshape2 package. I am dealing with data tables with millions of rows. Performance is critical. The generalized form of the question is whether there is a way to perform grouping based on a subset of values in a column and have the result of the grouping operation create one or more new columns. A specific form of the question could be how to use data.table to