tidyverse | 易学教程

Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

阅读更多关于 Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

问题 I'm trying to conduct a chi square test on several groups within the dplyr frame . The problem is, group_by() %>% summarise() doesn't seem to do trick. Simulated data (same structure as problematic data, but random, so p.values should be high) set.seed(1) data.frame(partido=sample(c("PRI", "PAN"), 100, 0.6), genero=sample(c("H", "M"), 100, 0.7), GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)) -> foo I want to compare several groups defined by GM to see if there are changes in the p.values for the

Use R to Download an individual shared file from a Shared Google Drive directory

阅读更多关于 Use R to Download an individual shared file from a *Shared* Google Drive directory

问题 My intent was to download a file from a shared GoogleDrive directory in R. A GoogleDrive directory I do not own, one that I just had access to. This seemed to be more complicated than I had realized. My intent was to be able to use the shared data file from the user's directory, to be able to incorporate the shared file URL into an R script wherein the data file can be downloaded and manipulated in R. Note: Surprisingly, I was able to solve my own problem, thus it felt only fair to share both

Reorganizing dataframe with multiple header types following “tidy” approach in R

阅读更多关于 Reorganizing dataframe with multiple header types following “tidy” approach in R

问题 I have a dataframe that looks like somewhat like this: Age A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq Comments 23 1 2 1 NA NA NA Good 54 NA NA NA 4 1 2 ABCD 43 2 4 7 NA NA NA HiHi I am trying to reorganize it in way shown below to make it more "tidy". Is there a way for me to do this that also incorporates the Age and Comments columns in the same style as shown for the other variables below? How would you suggest incorporating them - one idea is shown below, but I am open to other

counting specific words across multiple columns in R

阅读更多关于 counting specific words across multiple columns in R

问题 I have a data frame like this df <- data.frame(id=c(1, 2, 3, 4, 5), staple_1=c("potato", "cassava","rice","fruit","coffee"), staple_2=c("cassava","beer","peanuts","rice","yams"), staple_3=c("rice","peanuts","fruit","fruit","rice")) I also have a character vector like this staples<-c("potato","cassava","rice","yams") I would like to create a new variable that is the row sum of the occurrence of any of the words in the "staples" character vector. The outcome of which should look like this df

Using purrr (tidyverse) to map distance function across all columns of dataframe

阅读更多关于 Using purrr (tidyverse) to map distance function across all columns of dataframe

问题 I have a distance function which takes in 2 (numeric) vectors and calculates the distance between them. For a given dataframe ( mtcars_raw ) in the example below and a fixed input vector ( test_vec ) I would like to calculate the pairwise distances (i.e. apply the distance function) to each column and test_vec and return the vector of distances. The length of the vector should be the number of columns. Please see the reproducible example: library(datasets) # The raw dataframe containing only

Unexpected values while applying custom function in dplyr::mutate

阅读更多关于 Unexpected values while applying custom function in dplyr::mutate

问题 My data looks like this: library(tidyverse) df <- tribble( ~y_val, ~z_val, 2, 4, 5, 3, 8, 2, 1, 1, 9, 3) I have custom function fun_b() that I would like to apply to the data frame with a dplyr::mutate call. However, fun_b() uses function fun_a() which has a loop inside of it: fun_a <- function(x, y, z, times = 1) { df <- data.frame() for (i in 1:times) { x <- x * 2 + i * x y <- y / 3 + i * y z <- z + 1 + z * i d <- data.frame(x, y, z) df <- rbind(df, d) } return(df) } fun_b <- function(x, y,

Scatter plot in ggplot, one numeric variable across two groups

阅读更多关于 Scatter plot in ggplot, one numeric variable across two groups

问题 I would like to create a scatter plot in ggplot2 which displays male test_scores on the x-axis and female test_scores on the y-axis using the dataset below. I can easily create a geom_line plot splitting male and female and putting the date ("dts") on the x-axis. library(tidyverse) #create data dts <- c("2011-01-02","2011-01-02","2011-01-03","2011-01-04","2011-01-05", "2011-01-02","2011-01-02","2011-01-03","2011-01-04","2011-01-05") sex <- c("M","F","M","F","M","F","M","F","M","F") test <-

Tidyverse conflicts with automatic manifest maker

阅读更多关于 Tidyverse conflicts with automatic manifest maker

问题 I'm trying to get an R-script to work, the instructions on how I set up/ installed the packages can be found here https://forum.qiime2.org/t/automatic-manifest-maker-in-r/2921 If you'd like to try the script please add #!/usr/bin/env Rscript to the first line of the script and de/reactivate R-Env (Credit; Duckmayr) syntax errors (in R-script) The new issue is that it now returns a new error; (R-Env) qiime2@qiime2core2018-8:~/Desktop/MACE_DEMUX_FASTQs/barcode08$ ~/Taxonomy.R ── Attaching

R tidyverse way to change cell content (string) based on a keyword condition with content of another cell plus text

阅读更多关于 R tidyverse way to change cell content (string) based on a keyword condition with content of another cell plus text

问题 I'm just starting to learn tidyverse approaches and currently I am looking how to solve the following problem: in the following dataframe: i'm looking to replace all the "sensitivity level" cells with the name of the channel found 1 row up, in the 2nd column, combined with the word "sensitivity" after successfully doing so I want to run it through the other tidyverse solution to the other part of my problem that is posted here this is the code to generate an exact replica of my dataframe: df

geom_raster to visualize missing values with additional colorcode

阅读更多关于 geom_raster to visualize missing values with additional colorcode

问题 This question is a follow-up to my previous question: Adding color code (fill) to vis_miss plot I would like to visualize the "missing info" in a data frame using geom_raster from ggplot2 in R while also highlighting some additional data structure using color-coding. Solution attempt: library(tidyverse) x11() airquality %>% mutate(id = row_number()) %>% gather(-c(id,Month), key = "key", value = "val") %>% mutate(isna = is.na(val)) %>% mutate(Month=as.factor(ifelse(isna==TRUE,NA,Month))) %>%