tidyverse | 易学教程

Function for Tidy chisq.test Output for Visualizing or Filtering P-Values

阅读更多关于 Function for Tidy chisq.test Output for Visualizing or Filtering P-Values

问题 For data... library(productplots) library(ggmosaic) For code... library(tidyverse) library(broom) I'm trying to create tidy chisq.test output so that I can easily filter or visualize p-values. I'm using the "happy" dataset (which is included with either of the packages listed above) For this example, if I wanted to condition the "happy" variable on all other variables,I would isolate the categorical variables (I'm not going to create factor groupings out of age, year, etc, for this example),

How to separate a column in dplyr based on regex

阅读更多关于 How to separate a column in dplyr based on regex

I have the following data frame: df <- structure(list(X2 = c("BB_137.HVMSC", "BB_138.combined.HVMSC", "BB_139.combined.HVMSC", "BB_140.combined.HVMSC", "BB_141.HVMSC", "BB_142.combined.HMSC-bm")), .Names = "X2", row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")) Which looks like this > df # A tibble: 6 x 1 X2 <chr> 1 BB_137.HVMSC 2 BB_138.combined.HVMSC 3 BB_139.combined.HVMSC 4 BB_140.combined.HVMSC 5 BB_141.HVMSC 6 BB_142.combined.HMSC-bm What I want to do is to separate into two columns (with . as separator), by keeping the last field as second column col1 col2 BB_137 HVMSC

How to feed a list of unquoted column names into `lapply` (so that I can use it with a `dplyr` function)

阅读更多关于 How to feed a list of unquoted column names into `lapply` (so that I can use it with a `dplyr` function)

问题 I am trying to write a function in tidyverse/dplyr that I want to eventually use with lapply (or map ). (I had been working on it to answer this question, but came upon an interesting result/dead-end. Please don't mark this as a duplicate - this question is an extension/departure from the answers that you see there.) Is there 1) a way to get a list of quoted variables to work inside a dplyr function (and not use the deprecated SE_ functions) or is there 2) some way to feed a list of unquoted

Find variable combinations that makes Primary Key in R

阅读更多关于 Find variable combinations that makes Primary Key in R

Here is my toy dataframe. df <- tibble::tribble( ~var1, ~var2, ~var3, ~var4, ~var5, ~var6, ~var7, "A", "C", 1L, 5L, "AA", "AB", 1L, "A", "C", 2L, 5L, "BB", "AC", 2L, "A", "D", 1L, 7L, "AA", "BC", 2L, "A", "D", 2L, 3L, "BB", "CC", 1L, "B", "C", 1L, 8L, "AA", "AB", 1L, "B", "C", 2L, 6L, "BB", "AC", 2L, "B", "D", 1L, 9L, "AA", "BC", 2L, "B", "D", 2L, 6L, "BB", "CC", 1L) How can I get the combination of a minimum number of variables that uniquely identify the observations in the dataframe i.e which variables together can make the primary key ? The way I approached this problem is to find the

Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

阅读更多关于 Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

Suppose we have a character vector cols_to_select containing some columns we want to select from a dataframe df , e.g. df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3) cols_to_select <- c("b", "d") Suppose also we want to use dplyr::select because it's part of an operation that uses %>% so using select makes the code easy to read. There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct' version and why? Or perhaps there is another, better way? dplyr::select(df, cols_to_select) #Fails if

Use filter() (and other dplyr functions) inside nested data frames with map()

阅读更多关于 Use filter() (and other dplyr functions) inside nested data frames with map()

I'm trying to use map() of purrr package to apply filter() function to the data stored in a nested data frame. "Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr . I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered. I can achieve it now by performing nest() twice: once on all data, and second on filtered data: library(tidyverse) df <- tibble( a = sample(x = rep(c('x','y'),5), size = 10), b = sample(c(1

How does ggplot scale_continuous expand argument work?

阅读更多关于 How does ggplot scale_continuous expand argument work?

问题 I am trying to figure out how scale_continuous() expand argument works. According to scale_continuous documentation: A numeric vector of length two giving multiplicative and additive expansion constants. These constants ensure that the data is placed some distance away from the axes. The defaults are c(0.05, 0) for continuous variables, and c(0, 0.6) for discrete variables. Since they are "expansion constants", they are not actual units. Is there any way to convert them to some actual

Evaluation Error when tidyverse is loaded after Hmisc

阅读更多关于 Evaluation Error when tidyverse is loaded after Hmisc

I am using r 3.3.3, dplyr 0.7.4, and Hmisc 4.1-1. I noticed that the order I load packages effects whether or not a dplyr::summaries function wold work or not. I understand that loading packages in a different order would mask certain functions but I am using the package::function() syntax to avoid that issue. The exact issue revolves around labeled variables. I know that there has been issues in the past with tidyverse and variable labels but none seem to address why this particular situation is occurring. First example that works - I load only Hmisc then dplyr and I am able to summaries the

how to compute rowsums using tidyverse

阅读更多关于 how to compute rowsums using tidyverse

I did mtcars %>% by_row(sum) but got the message: by_row() is deprecated; please use a combination of: tidyr::nest(); dplyr::mutate(); purrr::map() My naive approach is this mtcars %>% group_by(id = row_number()) %>% nest(-id) %>% mutate(hi = map_dbl(data, sum)) Is there a way to do it without creating an "id" column? Is this what you are looking for? mtcars %>% mutate(rowsum = rowSums(.)) Output: mpg cyl disp hp drat wt qsec vs am gear carb rowsum 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 328.980 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 329.795 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

use dplyr mutate() in programming

阅读更多关于 use dplyr mutate() in programming

I am trying to assign a column name to a variable using mutate. df <-data.frame(x = sample(1:100, 50), y = rnorm(50)) new <- function(name){ df%>%mutate(name = ifelse(x <50, "small", "big")) } When I run new(name = "newVar") it doesn't work. I know mutate_() could help but I'm struggling in using it together with ifelse . Any help would be appreciated. Using dplyr 0.7.1 and its advances in NSE, you have to UQ the argument to mutate and then use := when assigning. There is lots of info on programming with dplyr and NSE here: https://cran.r-project.org/web/packages/dplyr/vignettes/programming