reshape2 | 易学教程

Fast melted data.table operations

阅读更多关于 Fast melted data.table operations

I am looking for patterns for manipulating data.table objects whose structure resembles that of dataframes created with melt from the reshape2 package. I am dealing with data tables with millions of rows. Performance is critical. The generalized form of the question is whether there is a way to perform grouping based on a subset of values in a column and have the result of the grouping operation create one or more new columns. A specific form of the question could be how to use data.table to accomplish the equivalent of what dcast does in the following: input <- data.table( id=c(1, 1, 1, 2, 2,

Error with custom aggregate function for a cast() call in R reshape2

阅读更多关于 Error with custom aggregate function for a cast() call in R reshape2

I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package. # example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) # melt tab.melt <- melt(tab, id=1) # function to summarize with logic: mean if

dcast without ID variables

阅读更多关于 dcast without ID variables

问题 In the "An Introduction to reshape2" package Sean C. Anderson presents the following example. He uses the airquality data and renames the column names names(airquality) <- tolower(names(airquality)) The data look like # ozone solar.r wind temp month day # 1 41 190 7.4 67 5 1 # 2 36 118 8.0 72 5 2 # 3 12 149 12.6 74 5 3 # 4 18 313 11.5 62 5 4 # 5 NA NA 14.3 56 5 5 # 6 28 NA 14.9 66 5 6 Then he melts them by aql <- melt(airquality, id.vars = c("month", "day")) to get # month day variable value

reshape2: multiple results of aggregation function?

阅读更多关于 reshape2: multiple results of aggregation function?

From what I read, *cast operations in reshape2 lost their result_variable feature. Hadley hints at using plyr for this purpose (appending multiple result columns to the input data frame). How would I realize the documentation example ... aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE) cast(aqm, month ~ variable + result_variable, range) using reshape2 ( dcast ) and plyr ( ddply )? This question has multiple answers, due to the flexibility of the 'reshape2' and 'plyr' packages. I will show one of the easiest examples to understand here: library(reshape2) library(plyr) aqm <- melt

How to “unmelt” data with reshape r

阅读更多关于 How to “unmelt” data with reshape r

I have a data frame that I melted using the reshape package that I would like to "un melt". here is a toy example of the melted data (real data frame is 500x100 or larger) : variable<-c(rep("X1",3),rep("X2",3),rep("X3",3)) value<-c(rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3)) dat <-data.frame(variable,value) dat variable value 1 X1 0.5285376 2 X1 0.5285376 3 X1 0.5285376 4 X2 0.1694908 5 X2 0.1694908 6 X2 0.1694908 7 X3 0.7446906 8 X3 0.7446906 9 X3 0.7446906 Each variable (X1, X2,X3) has values estimated at 3 different times (which in this toy example happen to be the

No non-missing arguments warning when using min or max in reshape2

阅读更多关于 No non-missing arguments warning when using min or max in reshape2

I get the following warning when I use min or max in the dcast function from the reshape2 package. What is it telling me? I can't find anything that explains the warning message and I'm a bit confused about why I get it when I use max but not when I use mean or other aggregate functions. Warning message: In .fun(.value[0], ...) : no non-missing arguments to min; returning Inf Here's a reproducible example: data(iris) library(reshape2) molten.iris <- melt(iris,id.var="Species") summary(molten.iris) str(molten.iris) #------------------------------------------------------------ # Both return

R: “Unary operator error” from multiline ggplot2 command

阅读更多关于 R: “Unary operator error” from multiline ggplot2 command

I'm using ggplot2 to do a boxplot comparison of two different species, as indicated by the third column shown below: > library(reshape2) > library(ggplot2) > melt.data = melt(actb.raw.data) > head(actb.raw.data) region expression species 1 CG -0.17686667 human 2 CG -0.06506667 human 3 DG 1.04590000 human 4 CA1 1.94093333 human 5 CA2 1.55023333 human 6 CA3 1.75800000 human > head(melt.data) region species variable value 1 CG human expression -0.17686667 2 CG human expression -0.06506667 3 DG human expression 1.04590000 4 CA1 human expression 1.94093333 5 CA2 human expression 1.55023333 6 CA3

melt a data.table with a column pattern

阅读更多关于 melt a data.table with a column pattern

问题 I have a data.table that looks like this: id A1g_hi A2g_hi A3g_hi A4g_hi 1 2 3 4 5 ... I would like to melt this table so that it looks like this: id time hi 1 1 2 1 2 3 1 3 4 1 4 5 ... I attempted something like this: melt(dtb, measure.vars = patterns("^A"), value.name = "hi", variable.name="time") which does not give me what I would like. Do I need to resort to string splitting here or are there native data.table functions that do this? 回答1: I raise my glass to @rawr who apparently

Error with custom aggregate function for a cast() call in R reshape2

阅读更多关于 Error with custom aggregate function for a cast() call in R reshape2

问题 I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package. # example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3),

Fast melted data.table operations

阅读更多关于 Fast melted data.table operations

问题 I am looking for patterns for manipulating data.table objects whose structure resembles that of dataframes created with melt from the reshape2 package. I am dealing with data tables with millions of rows. Performance is critical. The generalized form of the question is whether there is a way to perform grouping based on a subset of values in a column and have the result of the grouping operation create one or more new columns. A specific form of the question could be how to use data.table to