tidyr | 易学教程

Complete column with group_by and complete

阅读更多关于 Complete column with group_by and complete

问题 I've got a little problem using dplyr group_by function. After doing this : datasetALL %>% group_by(YEAR,Region) %>% summarise(count_number = n()) here is the result : YEAR Region count_number <int> <int> <int> 1 1946 1 2 2 1946 2 3 3 1946 3 1 4 1946 5 1 5 1947 3 1 6 1947 4 1 I would like something like : YEAR Region count_number <int> <int> <int> 1 1946 1 2 2 1946 2 3 3 1946 3 1 4 1946 5 1 5 1946 4 0 #order is no important 6 1947 1 0 7 1947 2 0 8 1947 3 1 9 1947 4 1 10 1947 5 0 I try to use

gather with tidyr: position must be between 0 and n error

阅读更多关于 gather with tidyr: position must be between 0 and n error

问题 I have some data like below: x.row10 <- setNames(data.frame(letters[1:3],1:3,2:4,3:5,4:6,5:7,6:8,7:9), c("names",2004:2009,2012)) # names 2004 2005 2006 2007 2008 2009 2012 #1 a 1 2 3 4 5 6 7 #2 b 2 3 4 5 6 7 8 #3 c 3 4 5 6 7 8 9 Now I can make them long with gather() from the tidyr package by writing: x.row10 %>% gather(Year, Val, -names) But when I use x.row10 %>% gather(Year, Val, c(2004:2009,2012)) which is my intuitive choice, I get the error message Error: Position must be between 0 and

Fill missing values in data.frame using dplyr complete within groups

阅读更多关于 Fill missing values in data.frame using dplyr complete within groups

问题 I'm trying to fill missing values in my dataframe, but I do not want all possible combinations of variables - I only want to fill based on a grouping of three variables: coursecode, year, and week. I've looked into complete() in tidyr library but I can't get it to work, even after looking at Using tidyr::complete with group_by and https://blog.rstudio.org/2015/09/13/tidyr-0-3-0/ I have observers that collect data on given weeks of the year at different courses. For example, data might be

Opposite of unnest_tokens

阅读更多关于 Opposite of unnest_tokens

问题 This is most likely a stupid question, but I've googled and googled and can't find a solution. I think it's because I don't know the right way to word my question to search. I have a data frame that I have converted to tidy text format in R to get rid of stop words. I would now like to 'untidy' that data frame back to its original format. What's the opposite / inverse command of unnest_tokens? Edit: here is what the data I'm working with look like. I'm trying to replicate analyses from Silge

how to create categories conditionally using other variables values and sequence

阅读更多关于 how to create categories conditionally using other variables values and sequence

I would appreciate any help to create a function that allows me to create categories of one variable using the order of a set of other variables values. Specifically, I want a function that: creates category E1 of the variable variable the first time that each combination of values of the variables A , B , and ID appears in the dataset. creates category E2 of the variable variable the second time that each combination of values of the variables A , B , and ID appears in the dataset. creates category E3 of the variable variable the third time that each combination of values of the variables A ,

How to pair rows in a data frame with many columns using dplyr in R?

阅读更多关于 How to pair rows in a data frame with many columns using dplyr in R?

I have a dataframe containing multiple observations from the control and the experimental cohorts with replicates for each subject. Here is an example of my dataframe: subject cohort replicate val1 val2 A control 1 10 0.1 A control 2 15 0.3 A experim 1 40 0.7 A experim 2 45 0.9 B control 1 5 0.3 B experim 1 30 0.0 C control 1 50 0.5 C experim 1 NA 1.0 I'd like to pair each control observation with its corresponding experimental one for each value to calculate the ratio between the pairs. The desired output would look something like this: subject replicate ratio_val1 ratio_val2 A 1 4 7 A 2 3 3

dplyr - sum of multiple columns using regular expressions

阅读更多关于 dplyr - sum of multiple columns using regular expressions

For the dataset mtcars2 mtcars2 = mtcars mtcars2 = mtcars2 %>% mutate(cyl9=cyl, disp9=disp, gear2=gear) I want to get a new column which is the sum of multiple columns, by using regular expressions to capture the pattern. This is a solution, however this is done by hard-coding select(mtcars2, cyl9) + select(mtcars2, disp9) + select(mtcars2, gear2) I tried something like this but it gives me a number instead of a vector mtcars2 %>% select(matches("[0-9]")) %>% sum Please dplyr solutions only, since i need to apply these functions to a sql table later on. Thanks! Update.. I need the solution to

Using gather from tidyr changes my regression results

阅读更多关于 Using gather from tidyr changes my regression results

问题 When I run the code below, everything works as expected # install.packages("dynlm") # install.packages("tidyr") require(dynlm) require(tidyr) Time <- 1950:1993 Y <- c(5820, 5843, 5917, 6054, 6099, 6365, 6440, 6465, 6449, 6658, 6698, 6740, 6931, 7089, 7384, 7703, 8005, 8163, 8506, 8737, 8842, 9022, 9425, 9752, 9602, 9711, 10121, 10425, 10744, 10876, 10746, 10770, 10782, 11179, 11617, 12015, 12336, 12568, 12903, 13029, 13093, 12899, 13110, 13391) X <- c(6284, 6390, 6476, 6640, 6628, 6879, 7080,

extracting values from column using tidyr

阅读更多关于 extracting values from column using tidyr

问题 I have data.frame annot defined as: annot <- structure(list(Name = c("dd_1", "dd_2", "dd_3","dd_4", "dd_5", "dd_6","dd_7"), GOs = c("C:extracellular space; C:cell body; P:cell migration process; P:NF/ß pathway", "C:Signal transduction; C:nucleus; F:positive regulation; P:single organism; P:positive(+) regulation", "C:cardiomyceltes; C:intracellular pace; F:putative; F:magnesium ion binding; F:calcium ion binding; P:visual perception; P:blood coagulation", "F:poly(A) RNA binding; P:DNA

Unable to use tidyselect `everything()` in combination with `group_by()` and `fill()`

阅读更多关于 Unable to use tidyselect `everything()` in combination with `group_by()` and `fill()`

library(tidyverse) df <- tibble(x1 = c("A", "A", "A", "B", "B", "B"), x2 = c(NA, 8, NA, NA, NA, 5), x3 = c(3, 6, 5, 9, 1, 9)) #> # A tibble: 6 x 3 #> x1 x2 x3 #> <chr> <dbl> <dbl> #> 1 A NA 3 #> 2 A 8 NA #> 3 A NA 5 #> 4 B NA 9 #> 5 B NA 1 #> 6 B 5 9 I have groups 'A' and 'B' shown in column x1 . I need the 'NA' values in columns x2 and x3 to populate only from values within the same group, in the updown direction. That's simple enough, here's the code: df %>% group_by(x1) %>% fill(c(x2, x3), .direction = "updown") #> # A tibble: 6 x 3 #> x1 x2 x3 #> <chr> <dbl> <dbl> #> 1 A 8 3 #> 2 A 8 5 #>