tidyr

using tidyr unnest with NULL values

允我心安 提交于 2019-11-29 20:21:03
问题 I converted a JSON file into a data.frame with a a nested list structure, which I would like to unnest and flatten. Some of the values in the list are NULL, which unnest does not accept. If I replace the NULL values with a data.frame structure that has only NA values, I get the desired result. Below is a simplified example of my problem. I have tried to replace the NULL values with the NA data.frame but did not manage because of the the nested structure. How can I achieve the desired result?

Using dplyr window functions to calculate percentiles

独自空忆成欢 提交于 2019-11-29 20:15:14
I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions. Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and count of miles per gallon ("mpg") by the number of cylinders ("cyl"), I use the following code: library(dplyr) library(tidyr) # load data data("mtcars") # Percentiles used in calculation p <- c(.25,.5,.75) # old dplyr solution mtcars %>% group_by(cyl) %>% do(data.frame(p=p, stats=quantile(.$mpg, probs=p), n = length(.$mpg), avg = mean(.$mpg))) %>%

Splitting column by separator from right to left in R

人走茶凉 提交于 2019-11-29 16:42:09
I'm working on a dataset where one column ( Place ) consists of a location sentence. librabry(tidyverse) example <- tibble(Datum = c("October 1st 2017", "October 2st 2017", "October 3rd 2017"), Place = c("Tabiyyah Jazeera village, 20km south east of Deir Ezzor, Deir Ezzor Governorate, Syria", "Abu Kamal, Deir Ezzor Governorate, Syria", "شارع القطار al Qitar [train] street, al-Tawassiya area, north of Raqqah city centre, Raqqah governorate, Syria")) I would like to split the Place column by the comma separator so I prefer a solution with the tidyverse package . Because the values of Place have

SparklyR separate one Spark DataFrame column into two columns

坚强是说给别人听的谎言 提交于 2019-11-29 16:15:32
I have a dataframe containing a column named COL which is structured in this way: VALUE1###VALUE2 The following code is working library(sparklyr) library(tidyr) library(dplyr) mParams<- collect(filter(input_DF, TYPE == ('MIN'))) mParams<- separate(mParams, COL, c('col1','col2'), '\\###', remove=FALSE) If I remove the collect , I get this error: Error in UseMethod("separate_") : no applicable method for 'separate_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')" Is there any alternative to achieve what I want, but without collecting everything on my spark driver?

Using gather() to gather two (or more) groups of columns into two (or more) key-value pairs [duplicate]

拟墨画扇 提交于 2019-11-29 16:06:46
This question already has an answer here: Reshaping multiple sets of measurement columns (wide format) into single columns (long format) 7 answers I want to gather two seperate groups of columns into two key-value pairs. Here's some example data: library(dplyr) library(tidyr) ID = c(1:5) measure1 = c(1:5) measure2 = c(6:10) letter1 = c("a", "b", "c", "d", "e") letter2 = c("f", "g", "h", "i", "j") df = data.frame(ID, measure1, measure2, letter1, letter2) df = tbl_df(df) df$letter1 <- as.character(df$letter1) df$letter2 <- as.character(df$letter2) I want the values of the two measure columns

In R: get multiple rows by splitting a column using tidyr and reshape2 [duplicate]

落爺英雄遲暮 提交于 2019-11-29 15:42:30
This question already has an answer here: Split comma-separated strings in a column into separate rows 5 answers What is the most simpel way using tidyr or reshape2 to turn this data: data <- data.frame( A=c(1,2,3), B=c("b,g","g","b,g,q")) Into (e.g. make a row for each comma separated value in variable B ): A B 1 1 b 2 1 g 3 2 g 4 3 b 5 3 g 6 3 q Try library(splitstackshape) cSplit(data, 'B', ',', 'long') Or using base R lst <- setNames(strsplit(as.character(data$B), ','), data$A) stack(lst) Or library(tidyr) unnest(lst,A) 来源: https://stackoverflow.com/questions/30818840/in-r-get-multiple

How to split column into two in R using separate [duplicate]

删除回忆录丶 提交于 2019-11-29 15:06:28
This question already has an answer here: Split data frame string column into multiple columns 14 answers I have a dataset with a column of locations like this (41.797634883, -87.708426986). I'm trying to split it into latitude and longitude. I tried using the separate method from the tidyr package library(dplyr) library(tidyr) df <- data.frame(x = c('(4, 9)', '(9, 10)', '(20, 100)', '(100, 200)')) df %>% separate(x, c('Latitude', 'Longitude')) but I'm getting this error Error: Values not split into 2 pieces at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, What am I doing wrong? Specify

Changing Values from Wide to Long: 1) Group_By, 2) Spread/Dcast [duplicate]

十年热恋 提交于 2019-11-29 14:55:23
This question already has an answer here: Transpose / reshape dataframe without “timevar” from long to wide format 6 answers I've got a list of names of phone numbers, which I want to group by name, and bring them from a long format to a wide one, with the phone number filling across the columns Name Phone_Number John Doe 0123456 John Doe 0123457 John Doe 0123458 Jim Doe 0123459 Jim Doe 0123450 Jane Doe 0123451 Jill Doe 0123457 Name Phone_Number1 Phone_Number2 Phone_Number3 John Doe 0123456 0123457 0123458 Jim Doe 0123459 0123450 NA Jane Doe 0123451 NA NA Jill Doe NA NA NA library(dplyr)

How do I use tidyr to fill in completed rows within each value of a grouping variable?

孤者浪人 提交于 2019-11-29 13:58:07
Say I have data on people who choose between several options. I have one row per person, and I want to have one row per person and choice option. So, if I have 10 people who have 3 choices, right now I have 10 rows, and I want to have 30. All of the other variables should be copied to each of the new rows. So, for example, if I have a variable for gender, that should be constant within ID. (I am setting my data up this way to analyze with mnlogit .) This seems like the situation that two tidyr functions, complete and fill , were designed for. To use a simple example: library(lubridate) library

Gather multiple date/value columns using tidyr

↘锁芯ラ 提交于 2019-11-29 11:30:13
I have a data set containing (amongst others) multiple columns with dates and corresponding values (repeated measurements). Is there a way to turn this into a long data set containing (the others and) only two columns - one for dates and one for values - using tidyr ? The following code produces an example data frame: df <- data.frame( id = 1:10, age = sample(100, 10), date1 = as.Date('2015-09-22') - sample(100, 10), value1 = sample(100, 10), date2 = as.Date('2015-09-22') - sample(100, 10), value2 = sample(100, 10), date3 = as.Date('2015-09-22') - sample(100, 10), value3 = sample(100, 10)) The