tidyverse

Function for Tidy chisq.test Output for Visualizing or Filtering P-Values

雨燕双飞 提交于 2019-12-01 11:39:48
For data... library(productplots) library(ggmosaic) For code... library(tidyverse) library(broom) I'm trying to create tidy chisq.test output so that I can easily filter or visualize p-values. I'm using the "happy" dataset (which is included with either of the packages listed above) For this example, if I wanted to condition the "happy" variable on all other variables,I would isolate the categorical variables (I'm not going to create factor groupings out of age, year, etc, for this example), and then run a simple function. df<-happy%>%select(-year,-age,-wtssall) lapply(df,function(x)chisq.test

Sum multiple variables by group and create new column with their sum

牧云@^-^@ 提交于 2019-12-01 10:59:39
问题 I have a data frame with grouped variable and I want to sum them by group. It's easy with dplyr . library(dplyr) library(magrittr) data <- data.frame(group = c("a", "a", "b", "c", "c"), n1 = 1:5, n2 = 2:6) data %>% group_by(group) %>% summarise_all(sum) # A tibble: 3 x 3 group n1 n2 <fctr> <int> <int> 1 a 3 5 2 b 3 4 3 c 9 11 But now I want a new column total with the sum of n1 and n2 by group. Like this: # A tibble: 3 x 3 group n1 n2 ttl <fctr> <int> <int> <int> 1 a 3 5 8 2 b 3 4 7 3 c 9 11

counting the number of times a value appears in a column in relation to other columns in r

扶醉桌前 提交于 2019-12-01 10:24:58
问题 I am new to r and I have a dataframe very close to the one below and I would love to find a general way that tells me how many times plus 1, the number "0" appears for each country (intro4) and id. Intro4 number id 221 TAN 0 19 222 TAN 0 73 223 TAN 0 73 224 TOG 0 37 225 TOG 0 58 226 UGA 0 96 227 UGA 0 112 228 UGA 0 96 229 ZAM 0 40 230 ZAM 0 99 231 ZAM 0 139 I can do it by hand by it is a big data frame and would take forever, count () gives me the frequency but doesn't divide it between

How to apply same operation to multiple data frames in dplyr-R?

左心房为你撑大大i 提交于 2019-12-01 10:22:29
问题 I would like to apply the same operation to multiple data frames in 'R' but cannot get how to deal with this matter. This is an example of pipe operation in dplyr : library(dplyr) iris %>% mutate(Sepal=rowSums(select(.,starts_with("Sepal"))), Length=rowSums(select(.,ends_with("Length"))), Width=rowSums(select(.,ends_with("Width")))) iris2 <- iris iris3 <- iris Could you suggest how to apply the same pipe function to iris , iris2 and isis3 ? I need to use dplyr piping operation. I suppose map

Filter by multiple patterns with filter() and str_detect()

孤街浪徒 提交于 2019-12-01 08:11:37
I would like to filter a dataframe using filter() and str_detect() matching for multiple patterns without multiple str_detect() function calls. In the example below I would like to filter the dataframe df to show only rows containing the letters a f and o . df <- data.frame(numbers = 1:52, letters = letters) df %>% filter( str_detect(.$letters, "a")| str_detect(.$letters, "f")| str_detect(.$letters, "o") ) # numbers letters #1 1 a #2 6 f #3 15 o #4 27 a #5 32 f #6 41 o I have attempted the following df %>% filter( str_detect(.$letters, c("a", "f", "o")) ) # numbers letters #1 1 a #2 15 o #3 32

readr::read_csv issue: Chinese Character becomes messy codes

房东的猫 提交于 2019-12-01 08:02:28
I'm trying to import a dataset to RStudio, however I am stuck with Chinese characters, as they become messy codes. Here is the code: library(tidyverse) df <- read_csv("中文,英文\n英文,德文") df # A tibble: 1 x 2 `\xd6\xd0\xce\xc4` `Ӣ\xce\xc4` <chr> <chr> 1 "<U+04E2>\xce\xc4" "<U+00B5>\xc2\xce\xc4" When I use the base function read.csv, it works well. I guess I must do something wrong with encoding. But there are no encoding option in read_csv, how can I do this? This is because that the characters are marked as UTF-8 whereas the actual encoding is the system default (you can get by stringi::stri_enc

Print data frame dimensions at each step of filtering

旧街凉风 提交于 2019-12-01 06:22:04
问题 I am using the tidyverse to filter out a dataframe and would like a print at each step of the dimensions (or nrows) of the intermediate objects. I thought I could simply use a tee pipe operator from magrittr but it doesn't work. I think I understand the concept behind the tee pipe but can't figure out what is wrong. I searched extensively but didn't find much resources about the tee pipe. I built a simple example with the mtcars dataset. Printing the intermediate objects works but not if I

Fit a different model for each row of a list-columns data frame

核能气质少年 提交于 2019-12-01 05:48:15
What is the best way to fit different model formulae that vary by the row of a data frame with the list-columns data structure in tidyverse? In R for Data Science, Hadley presents a terrific example of how to use the list-columns data structure and fit many models easily ( http://r4ds.had.co.nz/many-models.html#gapminder ). I am trying to find a way to fit many models with slightly different formulae. In the below example adapted from his original example, what is the best way to fit a different model for each continent? library(gapminder) library(dplyr) library(tidyr) library(purrr) library

Add row in each group using dplyr and add_row()

旧巷老猫 提交于 2019-12-01 03:21:05
If I add a new row to the ìris dataset with: iris <- as_tibble(iris) > iris %>% add_row(.before=0) # A tibble: 151 × 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <chr> 1 NA NA NA NA <NA> <--- Good! 2 5.1 3.5 1.4 0.2 setosa 3 4.9 3.0 1.4 0.2 setosa It works. So, why can't I add a new row on top of each "subset" with: iris %>% group_by(Species) %>% add_row(.before=0) Error: is.data.frame(df) is not TRUE If you want to use a grouped operation, you need do like JasonWang described in his comment, as other functions like mutate or summarise expect a result

Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

杀马特。学长 韩版系。学妹 提交于 2019-12-01 00:29:43
I'm trying to conduct a chi square test on several groups within the dplyr frame . The problem is, group_by() %>% summarise() doesn't seem to do trick. Simulated data (same structure as problematic data, but random, so p.values should be high) set.seed(1) data.frame(partido=sample(c("PRI", "PAN"), 100, 0.6), genero=sample(c("H", "M"), 100, 0.7), GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)) -> foo I want to compare several groups defined by GM to see if there are changes in the p.values for the crosstab of partido and genero, conditional to GM. The obvious dplyr way should be: foo %>% group_by