reshape2

R SparkR - equivalent to melt function

两盒软妹~` 提交于 2019-12-22 00:13:16
问题 Is there a function similar to melt in SparkR library? Transform data with 1 row and 50 columns to 50 rows and 3 columns? 回答1: There is no built-in function that provides a similar functionality in SparkR. You can built your own with explode library(magrittr) df <- createDataFrame(data.frame( A = c('a', 'b', 'c'), B = c(1, 3, 5), C = c(2, 4, 6) )) melt <- function(df, id.vars, measure.vars, variable.name = "key", value.name = "value") { measure.vars.exploded <- purrr::map( measure.vars,

Must one `melt` a dataframe before having it `cast`?

孤者浪人 提交于 2019-12-21 21:29:25
问题 Must one melt a data frame prior to having it cast? From ?melt : data molten data frame, see melt. In other words, is it absolutely necessary to have a data frame molten prior to any acast or dcast operation? Consider the following: library("reshape2") library("MASS") xb <- dcast(Cars93, Manufacturer ~ Type, mean, value.var="Price") m.Cars93 <- melt(Cars93, id.vars=c("Manufacturer", "Type"), measure.vars="Price") xc <- dcast(m.Cars93, Manufacturer ~ Type, mean, value.var="value") Then: >

In R: dcast in function, pass column names (again!)

*爱你&永不变心* 提交于 2019-12-21 17:36:09
问题 Given a df in semi-long format with id variables a and b and measured data in columns m1 and m2 . The type of data is specified by the variable v (values var1 and var2). set.seed(8) df_l <- data.frame( a = rep(sample(LETTERS,5),2), b = rep(sample(letters,5),2), v = c(rep("var1",5),rep("var2",5)), m1 = sample(1:10,10,F), m2 = sample(20:40,10,F)) Looks as: a b v m1 m2 1 W r var1 3 40 2 N l var1 6 32 3 R a var1 9 28 4 F g var1 5 21 5 E u var1 4 38 6 W r var2 1 35 7 N l var2 8 33 8 R a var2 10 29

Fill area between two lines, with high/low and dates

风流意气都作罢 提交于 2019-12-20 12:46:12
问题 Forword: I provide a reasonably satisfactory answer to my own question. I understand this is acceptable practice. Naturally my hope is to invite suggestions and improvements. My purpose is to plot two time series (stored in a dataframe with dates stored as class 'Date') and to fill the area between the data points with two different colors according to whether one is above the other. For instance, to plot an index of Bonds and an index of Stocks, and to fill the area in red when the Stock

Reshape messy longitudinal survey data containing multiple different variables, wide to long

天涯浪子 提交于 2019-12-20 05:31:36
问题 I hope that I'm not recreating the wheel, and do not think that the following can be answered using reshape . I have messy longitudinal survey data, that I want to convert from wide to long format. By messy I mean: I have a mixture of variable types (numeric, factor, logical) Not all variables have been collected at every timepoint. For example: data <- read.table(header=T, text=' id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random 1 TRUE FALSE 87717.76 82281.25 happy happy filler 2

melt + strsplit, or opposite to aggregate

萝らか妹 提交于 2019-12-20 04:34:44
问题 I have a little question that seems to be so easy in concept, but I cannot find the way to do it... Say I have a data.frame df2 with a column listing car brands and another column with all the models per brand separated by ','. I have obtained df2 aggregating another data.frame named df1 with the primary key being the model. How should I proceed to do the opposite task (i.e.: from df2 to df1)? My guess is something like melt(df2, id=unlist(strsplit('models',','))) ... Many thanks! Here is a

from wide format to long format with results in multiple columns [duplicate]

丶灬走出姿态 提交于 2019-12-20 02:47:20
问题 This question already has answers here : Combine Multiple Columns Into Tidy Data [duplicate] (3 answers) Reshaping multiple sets of measurement columns (wide format) into single columns (long format) (7 answers) Closed 2 years ago . I have a data that looks like the following dataframe, but every combo has about ten fields, starting with name1, adress1, city1, etc id name1 adress1 name2 adress2 name3 adress3 1 1 John street a Burt street d chris street 1 2 2 Jack street b Ben street e connor

Balancing (creating same number of rows for each individual) data

大城市里の小女人 提交于 2019-12-18 08:55:38
问题 Given a data.table as follows, id1 is a subject-level ID, id2 is a within-subject repeated-measure ID, X are data variables of which there are many. I want to balance the data such that every individual has the same number of rows (repeated measures), which is the max(DT[,.N,by=id1][,N]) , but where id1 and id2 are adjusted as necessary, and X data values are replaced with NA for these new rows. The following: DT = data.table( id1 = c(1,1,2,2,2,3,3,3,3), id2 = c(1,2,1,2,3,1,2,3,4), X1 =

How do I subset column variables in DF1 based on the important variables I got in DF2?

被刻印的时光 ゝ 提交于 2019-12-17 21:14:05
问题 I have 2 df's like this ID = c('x1','x2','x5') df1 <- data.frame(ID) x1 = c(1,2,3,4,5) x2 = c(11,12,13,14,15) x3 = c(21,22,23,24,25) x4 = c(31,32,33,34,35) x5 = c(41,42,43,44,45) df2 <- data.frame(x1,x2,x3,x4,x5) Desired output x1 x2 x5 1 1 11 41 2 2 12 42 3 3 13 43 4 4 14 44 5 5 15 45 I would like my new dataset to contain only those variables that are identified in df1 as important (i.e: x1,x2,x5) with the values from df2. In this simple dataset, I know I could do this but just removing x3

From long to wide data with multiple columns

倾然丶 夕夏残阳落幕 提交于 2019-12-17 20:12:49
问题 Suggestions for how to smoothly get from foo to foo2 (preferably with tidyr or reshape2 packages)? This is kind of like this question, but not exactly I think, because I don't want to auto-number columns, just widen multiple columns. It's also kind of like this question, but again, I don't think I want the columns to vary with a row value as in that answer. Or, a valid answer to this question is to convince me it's exactly like one of the others. The solution in the second question of "two