tidyverse | 易学教程

R: Calculate measurement time-points for separate samples

阅读更多关于 R: Calculate measurement time-points for separate samples

问题 I have measured concentrations of N2O for different samples (Series) during 10 min intervals. Each sample was measured two times a day for 9 days. The N2O analyzer saved data (conc.) every second! My data now looks like this: DATE Series V A TIME Concentration 1: 2017-10-18T00:00:00Z O11 0.004022 0.02011 10:16:00.746 0.3512232 2: 2017-10-18T00:00:00Z O11 0.004022 0.02011 10:16:01.382 0.3498687 3: 2017-10-18T00:00:00Z O11 0.004022 0.02011 10:16:02.124 0.3482681 4: 2017-10-18T00:00:00Z O11 0

Detect a pattern in a column with R

阅读更多关于 Detect a pattern in a column with R

问题 I am trying to calculate how many times a person moved from one job to another. This can be calculated every time the Job column has this pattern 1 -> 0 -> 1 . In this example, it happened one rotation: Person Job A 1 A 0 A 1 A 1 In this another example, person B had one rotation as well. Person Job A 1 A 0 A 1 A 1 B 1 B 0 B 0 B 1 Whats would be a good approach to measure this pattern in a new column 'rotation', by person ? Person Job Rotation A 1 0 A 0 0 A 1 1 A 1 1 B 1 0 B 0 0 B 0 0 B 1 1

R: convert integers in a character vector (json) to multiple boolean columns

阅读更多关于 R: convert integers in a character vector (json) to multiple boolean columns

问题 I actually have a data frame with 2000 rows (different days), each row contains a character ”vector” containing binary info on 30 different skills. If the skill has been used its number appear in the vector. But to simplify: If I have a data frame with 3 observations (3 days) of 10 different skills -named "S_total" : S_total= [1,3,7,8,9,10], [5,9], [] , and a variable Day= 1,2,3 I'd like to construct a dataframe with 3 rows and 12 columns The columns being: Day,S_total,,s1,s,2,s3,s4,s5,s6,s7

Return an average of last or first two rows from a different group (indicated by a variable)

阅读更多关于 Return an average of last or first two rows from a different group (indicated by a variable)

问题 This is a follow-up to this question. With a data like below: data <- structure(list(seq = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L),

dplyr group by multiple variables summarise by multiple variables

阅读更多关于 dplyr group by multiple variables summarise by multiple variables

问题 New to R. Using dplyr, trying to group_by multiple variables, summarize by multiple variables, multiple functions. This works as expected mtcars %>% + group_by(cyl,hp) %>% + summarise(min_mpg = min(mpg) , min_disp = min(disp), max_mpg = max(mpg) , max_disp = max(disp)) But when I try to replicate with my df vmp %>% + group_by(Priority,LOS) %>% + summarise(inv_total = sum(Inv_Total), sr_count = count(SR_Nmbr)) I receive this error: Error in summarise_impl(.data, dots) : Evaluation error: no

dplyr: left_join where df A value lies between df B values

阅读更多关于 dplyr: left_join where df A value lies between df B values

问题 I'd like to know if it is possible to achieve the following using dplyr, or some tidyverse package... Context: I am having trouble getting my data into a structure that will allow the use of geom_rect . See this SO question for the motivation. library(tis) # Prepare NBER recession start end dates. recessions <- data.frame(start = as.Date(as.character(nberDates()[,"Start"]),"%Y%m%d"), end= as.Date(as.character(nberDates()[,"End"]),"%Y%m%d")) dt <- tibble(date=c(as.Date('1983-01-01'),as.Date(

Apply timeseries decomposition (and anomaly detection) over a sliding/tiled window

阅读更多关于 Apply timeseries decomposition (and anomaly detection) over a sliding/tiled window

问题 Anomaly detection methods published and now abandoned by twitter have been separately forked and maintained in the anomalize package and the hrbrmstr/AnomalyDetection fork. Both have implemented features that are 'tidy'. Working static versions tidyverse_cran_downloads %>% filter(package == "tidyr") %>% ungroup() %>% select(-package) -> one_package_only one_package_only %>% anomalize::time_decompose(count, merge = TRUE, method = "twitter", frequency = "7 days") -> one_package_only_decomp one

Using tidyr's gather_

阅读更多关于 Using tidyr's gather_

问题 Probably an easy one: I'd like to use tidyr 's gather_ on this data.frame : set.seed(1) df <- data.frame(a=rnorm(10),b=rnorm(10),d=rnorm(10),id=paste0("id",1:10)) First, using gather : df %>% tidyr::gather(key=name,value=val,-id) Gives me the desired outcome. However, trying to match that with gather_ like this: df %>% tidyr::gather_(key_col="name",value_col="val",gather_cols="id") Doesn't give me what the gather usage does. Any idea? 回答1: I think you want: df %>% tidyr::gather_(key_col="name

Select all rows which are duplicates except for one column

阅读更多关于 Select all rows which are duplicates except for one column

问题 I want to find rows in a dataset where the values in all columns, except for one, match. After much messing around trying unsuccessfully to get duplicated() to return all instances of the duplicate rows (not just the first instance), I figured out a way to do it (below). For example, I want to identify all rows in the Iris dataset that are equal except for Petal.Width. require(tidyverse) x = iris%>%select(-Petal.Width) dups = x[x%>%duplicated(),] answer = iris%>%semi_join(dups) > answer Sepal

Group-wise subsetting where feasible

阅读更多关于 Group-wise subsetting where feasible

问题 I would like to subset rows of my data library(data.table); set.seed(333); n <- 100 dat <- data.table(id=1:n, group=rep(1:2,each=n/2), x=runif(n,100,120), y=runif(n,200,220), z=runif(n,300,320)) > head(dat) id group x y z 1: 1 1 109.3400 208.6732 308.7595 2: 2 1 101.6920 201.0989 310.1080 3: 3 1 119.4697 217.8550 313.9384 4: 4 1 111.4261 205.2945 317.3651 5: 5 1 100.4024 212.2826 305.1375 6: 6 1 114.4711 203.6988 319.4913 in several stages within each group. I need to automate this and it