tidyverse

How to preserve the list of data frame form after using parallel apply

女生的网名这么多〃 提交于 2019-12-14 02:33:12
问题 I have the following function my_func which takes parameter stored in a data frame params and take one extra param as another df independently indf library(tidyverse) my_func <- function (x=NULL,y=NULL,z=NULL, indf=NULL) { out <- (x * y *z ) out * indf } params <- tribble( ~x, ~y, ~z, 5, 1, 1, 10, 5, 3, -3, 10, 5 ) indf <- tribble( ~A, ~B, ~C, 100, 10, 1, 1000, 300, 3, 20, 10, 5 ) params %>% pmap(my_func, indf=indf) It produces the following list of data frames: #> [[1]] #> A B C #> 1 500 50

How to summarize `Number of days since first date` and `Number of days seen` by ID and for a large data frame

扶醉桌前 提交于 2019-12-13 22:30:28
问题 The dataframe df1 summarizes detections of individuals ( ID ) through the time ( Date ). As a short example: df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2), Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10"))) df1 ID Date 1 1 2016-08-21 2 2 2016-08-24 3 1 2016-08-23 4 2 2016-08-29 5 1 2016-08-27 6 2 2016-09-02 7 1 2016-09-01 8 2 2016-09-09 9 1 2016-09-01 10 2 2016-09-10 I want to summarize either the

tidyr::spread() with multiple keys and values

拜拜、爱过 提交于 2019-12-13 17:28:25
问题 I assume this has been asked multiple times but I couldn't find the proper words to find a workable solution. How can I spread() a data frame based on multiple keys for multiple values? A simplified (I have many more columns to spread, but on only two keys: Id and time point of a given measurement) data I'm working with looks like this: df <- data.frame(id = rep(seq(1:10),3), time = rep(1:3, each=10), x = rnorm(n=30), y = rnorm(n=30)) > head(df) id time x y 1 1 1 -2.62671241 0.01669755 2 2 1

How to specify multiple columns with gather() function to tidy data

﹥>﹥吖頭↗ 提交于 2019-12-13 14:16:37
问题 I want to tidy my data with the gather function but how do I specify multiple columns at once? Say this is my data: Country Country.Code Year X0tot4 X5tot9 X10tot14 X15tot19 X20tot24 1 Viet Nam 704 1955 4606 2924 2389 2340 2502 2 Viet Nam 704 1960 5842 4410 2860 2356 2318 3 Viet Nam 704 1965 6571 5646 4328 2823 2335 4 Viet Nam 704 1970 7065 6391 5548 4271 2797 5 Viet Nam 704 1975 7658 6862 6237 5437 4208 6 Viet Nam 704 1980 7991 7473 6754 6113 5266 7 Viet Nam 704 1985 8630 7855 7375 6657 6027

How to order the ties in data with previously observed value appearing first with multiple sorting column

不打扰是莪最后的温柔 提交于 2019-12-13 10:29:26
问题 Disclaimer: Please note that this is extension, not a duplicate, to this topic: How to order the ties in data so that the previously observed value appears first. The difference is that now I don't have one, but many sorting column. I need to sort attached data by min, then by sec, then by timestamp. Additionally, if there are any ties in order I would like to order those ties so that the same values of subgroup would be adjacent, i.e if two observations hava the same min , sec and timestamp

Multiple column spread

天大地大妈咪最大 提交于 2019-12-13 10:06:15
问题 I have a need to do what is really what tidyr::spread() does, but for multiple value columns. If I have a data set like this: te <- structure(list(Syllable = c("[pa]", "[ta]", "[ka]", "[pa]", "[ta]", "[ka]", "[pa]", "[ta]", "[ka]", "[pa]"), PA = c(15.9252335141423, 2.17504491982172, 5.26727958979289, 4.48590068583509, 2.1316282072803e-13, 14.1415335887116, 3.51720477328246, 0.839953301362556, 5.74712643678048, 7.01396701583887), transient_mean = c(4.43699436235785, 4.8733556527069, 5

Combining rows by index in R [duplicate]

时光毁灭记忆、已成空白 提交于 2019-12-13 07:48:59
问题 This question already has answers here : Combining pivoted rows in R by common value (4 answers) Closed last year . EDIT: I am aware there is a similar question that has been answered, but it does not work for me on the dataset I have provided below. The above dataframe is the result of me using the spread function. I am still not sure how to consolidate it. EDIT2: I realized that the group_by function, which I had previously used on the data, is what was preventing the spread function from

Apply a vector of filters based on a string (or vector of strings) in dplyr

霸气de小男生 提交于 2019-12-13 04:17:38
问题 R and the tidyverse have some extremely powerful but equally mysterious methods for turning strings into actionable expressions. I feel like one needs to be an expert to really understand how to use them. NOTE: this question differs from this one in that I specifically ask about a vector (that is multiple) filter conditions. I demonstrate a solution for single filters that fails when I try multiple ways of extending it to multiple filters. I want to do something along the lines of: df = data

How to most efficiently filter a dataframe conditionnaly of values in another one, in the tidyverse framework?

孤街醉人 提交于 2019-12-13 04:12:01
问题 I have a dataframe df1 with an ID column and a lubridate time interval column, and I want to filter (subsample) a dataframe df2, which has ID and DateTime columns, so that only df2 rows with DateTime fitting the corresponding ID interval in df1 are kept. I want to do so in a tidyverse framework. It can easily be done using a join (see example below), but I would like to know whether there would be a more direct solution (maybe purrr-based) that would avoid joining and then removing the time

Flag rows with interval overlap in r

牧云@^-^@ 提交于 2019-12-13 03:57:42
问题 I have a df frame containing TV viewing data, I would like to run a QC check for overlapping viewing. Let's say for the same day, same household, for each individual, each minute should be credited to one station or channel only. for example, I would like to flag line 8 , 9 , because it seem impossible an individual in a unique household watched two TV stations (62,67) at the same time (start_hour_minute) . I am wondering is there a way to flag this rows? A sort of min by min view by