tidyverse

R: Calculate measurement time-points for separate samples

孤街醉人 提交于 2019-12-11 17:43:38
问题 I have measured concentrations of N2O for different samples (Series) during 10 min intervals. Each sample was measured two times a day for 9 days. The N2O analyzer saved data (conc.) every second! My data now looks like this: DATE Series V A TIME Concentration 1: 2017-10-18T00:00:00Z O11 0.004022 0.02011 10:16:00.746 0.3512232 2: 2017-10-18T00:00:00Z O11 0.004022 0.02011 10:16:01.382 0.3498687 3: 2017-10-18T00:00:00Z O11 0.004022 0.02011 10:16:02.124 0.3482681 4: 2017-10-18T00:00:00Z O11 0

Detect a pattern in a column with R

白昼怎懂夜的黑 提交于 2019-12-11 17:39:34
问题 I am trying to calculate how many times a person moved from one job to another. This can be calculated every time the Job column has this pattern 1 -> 0 -> 1 . In this example, it happened one rotation: Person Job A 1 A 0 A 1 A 1 In this another example, person B had one rotation as well. Person Job A 1 A 0 A 1 A 1 B 1 B 0 B 0 B 1 Whats would be a good approach to measure this pattern in a new column 'rotation', by person ? Person Job Rotation A 1 0 A 0 0 A 1 1 A 1 1 B 1 0 B 0 0 B 0 0 B 1 1

R: convert integers in a character vector (json) to multiple boolean columns

心不动则不痛 提交于 2019-12-11 17:35:34
问题 I actually have a data frame with 2000 rows (different days), each row contains a character ”vector” containing binary info on 30 different skills. If the skill has been used its number appear in the vector. But to simplify: If I have a data frame with 3 observations (3 days) of 10 different skills -named "S_total" : S_total= [1,3,7,8,9,10], [5,9], [] , and a variable Day= 1,2,3 I'd like to construct a dataframe with 3 rows and 12 columns The columns being: Day,S_total,,s1,s,2,s3,s4,s5,s6,s7

Return an average of last or first two rows from a different group (indicated by a variable)

橙三吉。 提交于 2019-12-11 17:31:17
问题 This is a follow-up to this question. With a data like below: data <- structure(list(seq = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L),

dplyr group by multiple variables summarise by multiple variables

拜拜、爱过 提交于 2019-12-11 17:05:39
问题 New to R. Using dplyr, trying to group_by multiple variables, summarize by multiple variables, multiple functions. This works as expected mtcars %>% + group_by(cyl,hp) %>% + summarise(min_mpg = min(mpg) , min_disp = min(disp), max_mpg = max(mpg) , max_disp = max(disp)) But when I try to replicate with my df vmp %>% + group_by(Priority,LOS) %>% + summarise(inv_total = sum(Inv_Total), sr_count = count(SR_Nmbr)) I receive this error: Error in summarise_impl(.data, dots) : Evaluation error: no

dplyr: left_join where df A value lies between df B values

霸气de小男生 提交于 2019-12-11 16:48:55
问题 I'd like to know if it is possible to achieve the following using dplyr, or some tidyverse package... Context: I am having trouble getting my data into a structure that will allow the use of geom_rect . See this SO question for the motivation. library(tis) # Prepare NBER recession start end dates. recessions <- data.frame(start = as.Date(as.character(nberDates()[,"Start"]),"%Y%m%d"), end= as.Date(as.character(nberDates()[,"End"]),"%Y%m%d")) dt <- tibble(date=c(as.Date('1983-01-01'),as.Date(

Apply timeseries decomposition (and anomaly detection) over a sliding/tiled window

依然范特西╮ 提交于 2019-12-11 15:39:27
问题 Anomaly detection methods published and now abandoned by twitter have been separately forked and maintained in the anomalize package and the hrbrmstr/AnomalyDetection fork. Both have implemented features that are 'tidy'. Working static versions tidyverse_cran_downloads %>% filter(package == "tidyr") %>% ungroup() %>% select(-package) -> one_package_only one_package_only %>% anomalize::time_decompose(count, merge = TRUE, method = "twitter", frequency = "7 days") -> one_package_only_decomp one

Using tidyr's gather_

﹥>﹥吖頭↗ 提交于 2019-12-11 15:35:54
问题 Probably an easy one: I'd like to use tidyr 's gather_ on this data.frame : set.seed(1) df <- data.frame(a=rnorm(10),b=rnorm(10),d=rnorm(10),id=paste0("id",1:10)) First, using gather : df %>% tidyr::gather(key=name,value=val,-id) Gives me the desired outcome. However, trying to match that with gather_ like this: df %>% tidyr::gather_(key_col="name",value_col="val",gather_cols="id") Doesn't give me what the gather usage does. Any idea? 回答1: I think you want: df %>% tidyr::gather_(key_col="name

Select all rows which are duplicates except for one column

倾然丶 夕夏残阳落幕 提交于 2019-12-11 15:26:36
问题 I want to find rows in a dataset where the values in all columns, except for one, match. After much messing around trying unsuccessfully to get duplicated() to return all instances of the duplicate rows (not just the first instance), I figured out a way to do it (below). For example, I want to identify all rows in the Iris dataset that are equal except for Petal.Width. require(tidyverse) x = iris%>%select(-Petal.Width) dups = x[x%>%duplicated(),] answer = iris%>%semi_join(dups) > answer Sepal

Group-wise subsetting where feasible

 ̄綄美尐妖づ 提交于 2019-12-11 14:30:59
问题 I would like to subset rows of my data library(data.table); set.seed(333); n <- 100 dat <- data.table(id=1:n, group=rep(1:2,each=n/2), x=runif(n,100,120), y=runif(n,200,220), z=runif(n,300,320)) > head(dat) id group x y z 1: 1 1 109.3400 208.6732 308.7595 2: 2 1 101.6920 201.0989 310.1080 3: 3 1 119.4697 217.8550 313.9384 4: 4 1 111.4261 205.2945 317.3651 5: 5 1 100.4024 212.2826 305.1375 6: 6 1 114.4711 203.6988 319.4913 in several stages within each group. I need to automate this and it