tidyverse

pandas or python equivalent of tidyr complete

白昼怎懂夜的黑 提交于 2020-08-06 07:52:08
问题 I have data that looks like this: library("tidyverse") df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1) df # user x y # 1 1 a 1 # 2 1 b 1 # 3 2 a 1 # 4 3 a 1 # 5 3 c 1 # 6 3 d 1 Python format: import pandas as pd df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1}) I'd like to "complete" the data frame so that every user has a record for every possible x with the default y fill set to 0. This is somewhat trivial in R

mutate_at() throwing an error with to_label() as its function in R

ε祈祈猫儿з 提交于 2020-08-05 09:37:42
问题 I'm following this data cleaning instruction, but as of this line ( also shown below ), I get the following error: Error: Problem with mutate() input l5cathol . Am I missing something? library(tidyverse) library(haven) library(sjmisc) library(googledrive) googledrive::drive_download('https://drive.google.com/file/d/124WOY4iBXxv_9eBXsoHJVUzX98x2sxYy/view?usp=sharing','test.por',overwrite=T) dta <- haven::read_por('test.por') names(dta) <- tolower(names(dta)) # Convert variables of interest to

Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

左心房为你撑大大i 提交于 2020-07-28 14:16:26
问题 I often find questions where people have somehow ended up with an unnamed list of unnamed character vectors and they want to bind them row-wise into a data.frame . Here is an example: library(magrittr) data <- cbind(LETTERS[1:3],1:3,4:6,7:9,c(12,15,18)) %>% split(1:3) %>% unname data #[[1]] #[1] "A" "1" "4" "7" "12" # #[[2]] #[1] "B" "2" "5" "8" "15" # #[[3]] #[1] "C" "3" "6" "9" "18" One typical approach is with do.call from base R. do.call(rbind, data) %>% as.data.frame # V1 V2 V3 V4 V5 #1

Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

女生的网名这么多〃 提交于 2020-07-28 14:14:31
问题 I often find questions where people have somehow ended up with an unnamed list of unnamed character vectors and they want to bind them row-wise into a data.frame . Here is an example: library(magrittr) data <- cbind(LETTERS[1:3],1:3,4:6,7:9,c(12,15,18)) %>% split(1:3) %>% unname data #[[1]] #[1] "A" "1" "4" "7" "12" # #[[2]] #[1] "B" "2" "5" "8" "15" # #[[3]] #[1] "C" "3" "6" "9" "18" One typical approach is with do.call from base R. do.call(rbind, data) %>% as.data.frame # V1 V2 V3 V4 V5 #1

Remove rows where all variables are NA using dplyr

二次信任 提交于 2020-07-28 06:18:19
问题 I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm curious to know if there is a simple way of doing it using dplyr. Example: library(tidyverse) dat <- tibble(a = c(1, 2, NA), b = c(1, NA, NA), c = c(2, NA, NA)) filter(dat, !is.na(a) | !is.na(b) | !is.na(c)) The filter call above does what I want but

Remove rows where all variables are NA using dplyr

一曲冷凌霜 提交于 2020-07-28 06:12:06
问题 I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm curious to know if there is a simple way of doing it using dplyr. Example: library(tidyverse) dat <- tibble(a = c(1, 2, NA), b = c(1, NA, NA), c = c(2, NA, NA)) filter(dat, !is.na(a) | !is.na(b) | !is.na(c)) The filter call above does what I want but

How to calculate normalized ratios in all possible combinations efficiently for a large matrix in R?

有些话、适合烂在心里 提交于 2020-07-22 05:51:05
问题 I want to calculate normalised ratios in all possible combinations efficiently for a large matrix in R. I have asked a similar question earlier here and with a small data and the solutions provided there worked fine. But when I am trying to apply the same solution for a large dataset (400 x 2151), my system is getting hang. My system is having 16 GB RAM with Intel i7 processer. Here is the code with data df <- matrix(rexp(860400), nrow = 400, ncol = 2151) Solution provided by @Ronak Shah cols

Re-value selection of columns by group

喜你入骨 提交于 2020-07-22 05:19:32
问题 I have data such as this: data_in <- read_table2("condition Q11_1 Q11_2 Q11_3 Q11_4 Q11_5 Q11_6 Q11_7 Q11_8 Q11_9 Q11_10 Q11_11 Q11_12 Q11_13 1 2 5 5 2 5 5 5 5 2 5 5 2 2 0 1 1 1 2 2 5 5 5 1 2 2 2 1 1 6 5 6 6 6 6 5 6 5 6 6 6 6 0 6 6 6 6 6 6 6 6 6 6 6 6 6 1 5 6 6 6 6 6 6 6 6 6 6 6 5 1 6 6 6 6 6 6 6 6 5 5 6 6 5 ") I have a series of variables (from a survey) with the values (1,2,5,6). I want to change the values of a specific set of variables from 5 to 3 and the 6 to 4. In this example I've only

Making tidyeval function inside case_when

不羁岁月 提交于 2020-07-21 03:06:48
问题 I have a data set that I like to impute one value among others based on probability distribution of those values. Let make some reproducible example first library(tidyverse) library(janitor) dummy1 <- runif(5000, 0, 1) dummy11 <- case_when( dummy1 < 0.776 ~ 1, dummy1 < 0.776 + 0.124 ~ 2, TRUE ~ 5) df1 <- tibble(q1 = dummy11) here is the output: df1 %>% tabyl(q1) q1 n percent 1 3888 0.7776 2 605 0.1210 5 507 0.1014 I used mutate and sample to share value= 5 among value 1 and 2 like this: df1 %

Making tidyeval function inside case_when

白昼怎懂夜的黑 提交于 2020-07-21 03:06:28
问题 I have a data set that I like to impute one value among others based on probability distribution of those values. Let make some reproducible example first library(tidyverse) library(janitor) dummy1 <- runif(5000, 0, 1) dummy11 <- case_when( dummy1 < 0.776 ~ 1, dummy1 < 0.776 + 0.124 ~ 2, TRUE ~ 5) df1 <- tibble(q1 = dummy11) here is the output: df1 %>% tabyl(q1) q1 n percent 1 3888 0.7776 2 605 0.1210 5 507 0.1014 I used mutate and sample to share value= 5 among value 1 and 2 like this: df1 %