tidyverse | 易学教程

How to preserve the list of data frame form after using parallel apply

阅读更多关于 How to preserve the list of data frame form after using parallel apply

问题 I have the following function my_func which takes parameter stored in a data frame params and take one extra param as another df independently indf library(tidyverse) my_func <- function (x=NULL,y=NULL,z=NULL, indf=NULL) { out <- (x * y *z ) out * indf } params <- tribble( ~x, ~y, ~z, 5, 1, 1, 10, 5, 3, -3, 10, 5 ) indf <- tribble( ~A, ~B, ~C, 100, 10, 1, 1000, 300, 3, 20, 10, 5 ) params %>% pmap(my_func, indf=indf) It produces the following list of data frames: #> [[1]] #> A B C #> 1 500 50

How to summarize `Number of days since first date` and `Number of days seen` by ID and for a large data frame

阅读更多关于 How to summarize `Number of days since first date` and `Number of days seen` by ID and for a large data frame

问题 The dataframe df1 summarizes detections of individuals ( ID ) through the time ( Date ). As a short example: df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2), Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10"))) df1 ID Date 1 1 2016-08-21 2 2 2016-08-24 3 1 2016-08-23 4 2 2016-08-29 5 1 2016-08-27 6 2 2016-09-02 7 1 2016-09-01 8 2 2016-09-09 9 1 2016-09-01 10 2 2016-09-10 I want to summarize either the

tidyr::spread() with multiple keys and values

阅读更多关于 tidyr::spread() with multiple keys and values

问题 I assume this has been asked multiple times but I couldn't find the proper words to find a workable solution. How can I spread() a data frame based on multiple keys for multiple values? A simplified (I have many more columns to spread, but on only two keys: Id and time point of a given measurement) data I'm working with looks like this: df <- data.frame(id = rep(seq(1:10),3), time = rep(1:3, each=10), x = rnorm(n=30), y = rnorm(n=30)) > head(df) id time x y 1 1 1 -2.62671241 0.01669755 2 2 1

How to specify multiple columns with gather() function to tidy data

阅读更多关于 How to specify multiple columns with gather() function to tidy data

问题 I want to tidy my data with the gather function but how do I specify multiple columns at once? Say this is my data: Country Country.Code Year X0tot4 X5tot9 X10tot14 X15tot19 X20tot24 1 Viet Nam 704 1955 4606 2924 2389 2340 2502 2 Viet Nam 704 1960 5842 4410 2860 2356 2318 3 Viet Nam 704 1965 6571 5646 4328 2823 2335 4 Viet Nam 704 1970 7065 6391 5548 4271 2797 5 Viet Nam 704 1975 7658 6862 6237 5437 4208 6 Viet Nam 704 1980 7991 7473 6754 6113 5266 7 Viet Nam 704 1985 8630 7855 7375 6657 6027

How to order the ties in data with previously observed value appearing first with multiple sorting column

阅读更多关于 How to order the ties in data with previously observed value appearing first with multiple sorting column

问题 Disclaimer: Please note that this is extension, not a duplicate, to this topic: How to order the ties in data so that the previously observed value appears first. The difference is that now I don't have one, but many sorting column. I need to sort attached data by min, then by sec, then by timestamp. Additionally, if there are any ties in order I would like to order those ties so that the same values of subgroup would be adjacent, i.e if two observations hava the same min , sec and timestamp

Multiple column spread

阅读更多关于 Multiple column spread

问题 I have a need to do what is really what tidyr::spread() does, but for multiple value columns. If I have a data set like this: te <- structure(list(Syllable = c("[pa]", "[ta]", "[ka]", "[pa]", "[ta]", "[ka]", "[pa]", "[ta]", "[ka]", "[pa]"), PA = c(15.9252335141423, 2.17504491982172, 5.26727958979289, 4.48590068583509, 2.1316282072803e-13, 14.1415335887116, 3.51720477328246, 0.839953301362556, 5.74712643678048, 7.01396701583887), transient_mean = c(4.43699436235785, 4.8733556527069, 5

Combining rows by index in R [duplicate]

阅读更多关于 Combining rows by index in R [duplicate]

问题 This question already has answers here : Combining pivoted rows in R by common value (4 answers) Closed last year . EDIT: I am aware there is a similar question that has been answered, but it does not work for me on the dataset I have provided below. The above dataframe is the result of me using the spread function. I am still not sure how to consolidate it. EDIT2: I realized that the group_by function, which I had previously used on the data, is what was preventing the spread function from

Apply a vector of filters based on a string (or vector of strings) in dplyr

阅读更多关于 Apply a vector of filters based on a string (or vector of strings) in dplyr

问题 R and the tidyverse have some extremely powerful but equally mysterious methods for turning strings into actionable expressions. I feel like one needs to be an expert to really understand how to use them. NOTE: this question differs from this one in that I specifically ask about a vector (that is multiple) filter conditions. I demonstrate a solution for single filters that fails when I try multiple ways of extending it to multiple filters. I want to do something along the lines of: df = data

How to most efficiently filter a dataframe conditionnaly of values in another one, in the tidyverse framework?

阅读更多关于 How to most efficiently filter a dataframe conditionnaly of values in another one, in the tidyverse framework?

问题 I have a dataframe df1 with an ID column and a lubridate time interval column, and I want to filter (subsample) a dataframe df2, which has ID and DateTime columns, so that only df2 rows with DateTime fitting the corresponding ID interval in df1 are kept. I want to do so in a tidyverse framework. It can easily be done using a join (see example below), but I would like to know whether there would be a more direct solution (maybe purrr-based) that would avoid joining and then removing the time

Flag rows with interval overlap in r

阅读更多关于 Flag rows with interval overlap in r

问题 I have a df frame containing TV viewing data, I would like to run a QC check for overlapping viewing. Let's say for the same day, same household, for each individual, each minute should be credited to one station or channel only. for example, I would like to flag line 8 , 9 , because it seem impossible an individual in a unique household watched two TV stations (62,67) at the same time (start_hour_minute) . I am wondering is there a way to flag this rows? A sort of min by min view by