tidyverse | 易学教程

pandas or python equivalent of tidyr complete

阅读更多关于 pandas or python equivalent of tidyr complete

问题 I have data that looks like this: library("tidyverse") df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1) df # user x y # 1 1 a 1 # 2 1 b 1 # 3 2 a 1 # 4 3 a 1 # 5 3 c 1 # 6 3 d 1 Python format: import pandas as pd df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1}) I'd like to "complete" the data frame so that every user has a record for every possible x with the default y fill set to 0. This is somewhat trivial in R

mutate_at() throwing an error with to_label() as its function in R

阅读更多关于 mutate_at() throwing an error with to_label() as its function in R

问题 I'm following this data cleaning instruction, but as of this line ( also shown below ), I get the following error: Error: Problem with mutate() input l5cathol . Am I missing something? library(tidyverse) library(haven) library(sjmisc) library(googledrive) googledrive::drive_download('https://drive.google.com/file/d/124WOY4iBXxv_9eBXsoHJVUzX98x2sxYy/view?usp=sharing','test.por',overwrite=T) dta <- haven::read_por('test.por') names(dta) <- tolower(names(dta)) # Convert variables of interest to

Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

阅读更多关于 Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

问题 I often find questions where people have somehow ended up with an unnamed list of unnamed character vectors and they want to bind them row-wise into a data.frame . Here is an example: library(magrittr) data <- cbind(LETTERS[1:3],1:3,4:6,7:9,c(12,15,18)) %>% split(1:3) %>% unname data #[[1]] #[1] "A" "1" "4" "7" "12" # #[[2]] #[1] "B" "2" "5" "8" "15" # #[[3]] #[1] "C" "3" "6" "9" "18" One typical approach is with do.call from base R. do.call(rbind, data) %>% as.data.frame # V1 V2 V3 V4 V5 #1

Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

阅读更多关于 Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

Remove rows where all variables are NA using dplyr

阅读更多关于 Remove rows where all variables are NA using dplyr

问题 I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm curious to know if there is a simple way of doing it using dplyr. Example: library(tidyverse) dat <- tibble(a = c(1, 2, NA), b = c(1, NA, NA), c = c(2, NA, NA)) filter(dat, !is.na(a) | !is.na(b) | !is.na(c)) The filter call above does what I want but

Remove rows where all variables are NA using dplyr

阅读更多关于 Remove rows where all variables are NA using dplyr

How to calculate normalized ratios in all possible combinations efficiently for a large matrix in R?

阅读更多关于 How to calculate normalized ratios in all possible combinations efficiently for a large matrix in R?

问题 I want to calculate normalised ratios in all possible combinations efficiently for a large matrix in R. I have asked a similar question earlier here and with a small data and the solutions provided there worked fine. But when I am trying to apply the same solution for a large dataset (400 x 2151), my system is getting hang. My system is having 16 GB RAM with Intel i7 processer. Here is the code with data df <- matrix(rexp(860400), nrow = 400, ncol = 2151) Solution provided by @Ronak Shah cols

Re-value selection of columns by group

阅读更多关于 Re-value selection of columns by group

问题 I have data such as this: data_in <- read_table2("condition Q11_1 Q11_2 Q11_3 Q11_4 Q11_5 Q11_6 Q11_7 Q11_8 Q11_9 Q11_10 Q11_11 Q11_12 Q11_13 1 2 5 5 2 5 5 5 5 2 5 5 2 2 0 1 1 1 2 2 5 5 5 1 2 2 2 1 1 6 5 6 6 6 6 5 6 5 6 6 6 6 0 6 6 6 6 6 6 6 6 6 6 6 6 6 1 5 6 6 6 6 6 6 6 6 6 6 6 5 1 6 6 6 6 6 6 6 6 5 5 6 6 5 ") I have a series of variables (from a survey) with the values (1,2,5,6). I want to change the values of a specific set of variables from 5 to 3 and the 6 to 4. In this example I've only

Making tidyeval function inside case_when

阅读更多关于 Making tidyeval function inside case_when

问题 I have a data set that I like to impute one value among others based on probability distribution of those values. Let make some reproducible example first library(tidyverse) library(janitor) dummy1 <- runif(5000, 0, 1) dummy11 <- case_when( dummy1 < 0.776 ~ 1, dummy1 < 0.776 + 0.124 ~ 2, TRUE ~ 5) df1 <- tibble(q1 = dummy11) here is the output: df1 %>% tabyl(q1) q1 n percent 1 3888 0.7776 2 605 0.1210 5 507 0.1014 I used mutate and sample to share value= 5 among value 1 and 2 like this: df1 %

Making tidyeval function inside case_when

阅读更多关于 Making tidyeval function inside case_when