plyr

How to strsplit different number of strings in certain column by do function

橙三吉。 提交于 2019-11-26 04:54:01
问题 I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.: library(plyr) column <- c(\"jake\", \"jane jane\",\"john john john\") df <- data.frame(1:3, name = column) df$name <- as.character(df$name) df2 <- ldply(strsplit(df$name, \" \"), rbind) View(df2) As a result, we have data frame with number of column related to maximum number of stings in given element. When I try to do it in dplyr, I used do function: library(dplyr) df2 <

Idiomatic R code for partitioning a vector by an index and performing an operation on that partition

萝らか妹 提交于 2019-11-26 04:36:34
I'm trying to find the idiomatic way in R to partition a numerical vector by some index vector, find the sum of all numbers in that partition and then divide each individual entry by that partition sum. In other words, if I start with this: df <- data.frame(x = c(1,2,3,4,5,6), index = c('a', 'a', 'b', 'b', 'c', 'c')) I want the output to create a vector (let's call it z): c(1/(1+2), 2/(1+2), 3/(3+4), 3/(3+4), 5/(5+6), 6/(5+6)) If I were doing this is SQL and could use window functions, I would do this: select x / sum(x) over (partition by index) as z from df and if I were using plyr, I would

How to create a lag variable within each group?

白昼怎懂夜的黑 提交于 2019-11-26 03:14:35
问题 I have a data.table: set.seed(1) data <- data.table(time = c(1:3, 1:4), groups = c(rep(c(\"b\", \"a\"), c(3, 4))), value = rnorm(7)) data # groups time value # 1: b 1 -0.6264538 # 2: b 2 0.1836433 # 3: b 3 -0.8356286 # 4: a 1 1.5952808 # 5: a 2 0.3295078 # 6: a 3 -0.8204684 # 7: a 4 0.4874291 I want to compute a lagged version of the \"value\" column, within each level of \"groups\". The result should look like # groups time value lag.value # 1 a 1 1.5952808 NA # 2 a 2 0.3295078 1.5952808 # 3

Aggregate a data frame based on unordered pairs of columns

北城余情 提交于 2019-11-26 02:58:21
问题 I have a data set that looks something like this: id1 id2 size 1 5400 5505 7 2 5033 5458 1 3 5452 2873 24 4 5452 5213 2 5 5452 4242 26 6 4823 4823 4 7 5505 5400 11 Where id1 and id2 are unique nodes in a graph, and size is a value assigned to the directed edge connecting them from id1 to id2 . This data set is fairly large (a little over 2 million rows). What I would like to do is sum the size column, grouped by unordered node pairs of id1 and id2 . For example, in the first row, we have id1

Aggregate a dataframe on a given column and display another column

巧了我就是萌 提交于 2019-11-26 02:51:51
问题 I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column using the max function > aggregate(data$Score, list(data$Group), max) Group.1 x 1 1 3 2 2 4 But I also would like to display the Info column associated to the maximum value of the Score column for each group. I have no idea how to do this. My desired output would be: Group.1 x y 1 1 3 c 2 2 4 d Any hint? 回答1: First

dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

梦想与她 提交于 2019-11-26 02:30:08
问题 When using summarise with plyr \'s ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE . However, this doesn\'t work when using summarise with dplyr . Is there another way to keep empty categories in the result? Here\'s an example with fake data. library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr,

Applying a function to every row of a table using dplyr?

喜你入骨 提交于 2019-11-26 02:27:58
问题 When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row. e.g. data(iris) library(plyr) head( adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length)) ) Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len 1 5.1 3.5 1.4 0.2 setosa 5.1 2 4.9 3.0 1.4 0.2 setosa 4.9 3 4.7 3.2 1.3 0.2 setosa 4.7 4 4.6 3.1 1.5 0.2 setosa 4.6 5 5.0 3.6 1.4 0.2 setosa 5.0 6 5.4 3.9 1.7 0.4 setosa 5.4 Now I\'m using dplyr more,

Fastest way to add rows for missing time steps?

流过昼夜 提交于 2019-11-26 01:43:29
问题 I have a column in my datasets where time periods ( Time ) are integers ranging from a-b. Sometimes there might be missing time periods for any given group. I\'d like to fill in those rows with NA . Below is example data for 1 (of several 1000) group(s). structure(list(Id = c(1, 1, 1, 1), Time = c(1, 2, 4, 5), Value = c(0.568780482159894, -0.7207749516298, 1.24258192959273, 0.682123081696789)), .Names = c(\"Id\", \"Time\", \"Value\"), row.names = c(NA, 4L), class = \"data.frame\") Id Time

How to select the rows with maximum values in each group with dplyr? [duplicate]

不羁的心 提交于 2019-11-25 22:35:38
问题 This question already has answers here : How to select the row with the maximum value in each group (10 answers) Closed 7 months ago . I would like to select a row with maximum value in each group with dplyr. Firstly I generate some random data to show my question set.seed(1) df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5)) df$value <- runif(nrow(df)) In plyr, I could use a custom function to select this row. library(plyr) ddply(df, .(A, B), function(x) x[which.max(x$value),]) In dplyr, I

Convert data from long format to wide format with multiple measure columns

試著忘記壹切 提交于 2019-11-25 22:34:37
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 7 years ago . I am having trouble figuring out the most elegant and flexible way to switch data from long format to wide format when I have more than one measure variable I want to bring along. For example, here\'s a simple data frame in long format. ID is the subject, TIME is a time variable, and X and Y are measurements made of ID at TIME : > my.df <- data.frame(ID=rep(c(\"A\",\"B\",\"C\")