tidyr | 易学教程

Stop gather function from dropping factor labels

阅读更多关于 Stop gather function from dropping factor labels

I'm trying to use the gather function in tidyr - but it is stripping out the labels from factored data. My data looks something like this: > require(tidyr) > messy = data.frame(x=rep(seq(0,2),2),y=runif(6),z=runif(6),source=c('good','bad')) > messy x y z source 1 0 0.37627685 0.9108316 good 2 1 0.77593147 0.9944256 bad 3 2 0.01105364 0.1183923 good 4 0 0.37755463 0.6761343 bad 5 1 0.86333114 0.7312482 good 6 2 0.69085345 0.8288506 bad >tidy = gather(messy,coordinate,value,y:z) >tidy x source coordinate value 1 0 2 y 0.37627685 2 1 1 y 0.77593147 3 2 2 y 0.01105364 4 0 1 y 0.37755463 5 1 2 y 0

Pivot wider produces nested object

阅读更多关于 Pivot wider produces nested object

This is regarding latest tidyr release. I am trying pivot_wider & pivot_longer function from library(tidyr) (Update 1.0.0) I was trying to obtain normal iris dataset when I run below but instead I get nested sort of 3X5 dimension tibble, not sure whats happening (I read https://tidyr.tidyverse.org/articles/pivot.html ) but still not sure how to avoid this library(tidyr) iris %>% pivot_longer(-Species,values_to = "count") %>% pivot_wider(names_from = name, values_from = count) Expected Output: Normal Iris dataset (150 X 5 dimension) Edit: I read below that if I wrap around unnest() I get

tidyr separate column values into character and numeric using regex

阅读更多关于 tidyr separate column values into character and numeric using regex

问题 I'd like to separate column values using tidyr::separate and a regex expression but am new to regex expressions df <- data.frame(A=c("enc0","enc10","enc25","enc100","harab0","harab25","harab100","requi0","requi25","requi100"), stringsAsFactors=F) This is what I've tried library(tidyr) df %>% separate(A, c("name","value"), sep="[a-z]+") Bad Output name value 1 0 2 10 3 25 4 100 5 0 # etc How do I save the name column as well? 回答1: You may use a (?<=[a-z])(?=[0-9]) lookaround based regex with

Grouping linked unique ID pairs using R [duplicate]

阅读更多关于 Grouping linked unique ID pairs using R [duplicate]

This question already has an answer here: Make a group_indices based on several columns 1 answer I'm trying to link together pairs of unique IDs using R. Given the example below, I have two IDs (here ID1 and ID2) that indicate linkage. I'm trying to create groups of rows that are linked. In this example A is linked to B which is linked to D which is linked to E. Because these are all connected, I want to group them together. Next, there is also X which is linked to both Y and Z. Because these two are also connected, I want to assign them to a single group as well. How can I tackle this using R

Splitting rows with uneven string length into columns in R using tidyr [duplicate]

阅读更多关于 Splitting rows with uneven string length into columns in R using tidyr [duplicate]

This question already has an answer here: Split data frame string column into multiple columns 14 answers Edit: This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output. I'm trying to turn this: data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5", "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7", "Place1-Place1-Place1-Place1-Place3-Place5", "Place1-Place4

Convert Rows into Columns by matching string in R

阅读更多关于 Convert Rows into Columns by matching string in R

I have number of rows in a list like ' [1,] "Home" [2,] "A" [3,] "B" [4,] "C" [5,] "Home" [6,] "D" [7,] "E" [8,] "Home" [9,] "F" [10,] "G" [11,] "H" [12,] "I" these rows are coming dynamically...after "Home" there can be two,three,four,five or more subcategories....so binding is not working... I have more than 5000 rows and "Home" is common in the start for every subcategories.. I Want it to look like this. [,1] [,2] [,3] [,4] [,5] [1,] "Home" "A" "B" "C" [2,] "Home" "D" "E" [3,] "Home" "F" "G" "H" "I" OR I have also used transpose to covert all rows into columns and on using transpose I got.

tidyr separate column values into character and numeric using regex

阅读更多关于 tidyr separate column values into character and numeric using regex

I'd like to separate column values using tidyr::separate and a regex expression but am new to regex expressions df <- data.frame(A=c("enc0","enc10","enc25","enc100","harab0","harab25","harab100","requi0","requi25","requi100"), stringsAsFactors=F) This is what I've tried library(tidyr) df %>% separate(A, c("name","value"), sep="[a-z]+") Bad Output name value 1 0 2 10 3 25 4 100 5 0 # etc How do I save the name column as well? You may use a (?<=[a-z])(?=[0-9]) lookaround based regex with tidyr::separate : > tidyr::separate(df, A, into = c("name", "value"), "(?<=[a-z])(?=[0-9])") name value 1 enc

De-aggregate / reverse-summarise / expand a dataset in R

阅读更多关于 De-aggregate / reverse-summarise / expand a dataset in R

My data looks like this: data("Titanic") df <- as.data.frame(Titanic) How can I de-aggregate or reverse-summarise count/freq and expand the data set back to it's original non-count observation state? For instance, I want 3rd, Male, Child, No repeated 35 times and 1st, Female, Adult, Yes repeated 140 times, etc, etc, in the dataframe. Thanks in advance. Without packages we can repeat each row according to the frequencies given: df2 <- df[rep(1:nrow(df), df[,5]),-5] You can do this with list columns and a few dplyr / tidyr / purrr verbs. It's not as compact as other base R solutions may be, but

Separate a column into 2 columns at the last underscore in R

阅读更多关于 Separate a column into 2 columns at the last underscore in R

I have a dataframe like this id <-c("1","2","3") col <- c("CHB_len_SCM_max","CHB_brf_SCM_min","CHB_PROC_S_SV_mean") df <- data.frame(id,col) I want to create 2 columns by separating the "col" into the measurement and stat. stat is basically the text after the last underscore (max,min,mean, etc) My desired output is id Measurement stat 1 CHB_len_SCM max 2 CHB_brf_SCM min 3 CHB_PROC_S_SV mean I tried it this way but the stat column in empty. I am not sure if I am pointing to the last underscore. library(tidyverse) df1 <- df %>% # Separate the sensors and the summary statistic separate(col, into

Separate contents of field

阅读更多关于 Separate contents of field

I'm sure this is very simple, and I think it's a case of using separate and gather. I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors. For example this is just a selection of the options available: Author Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P. What I'd like to do is create a single list of ALL authors so that I'd have something like Author Drijgers RL Verhey FR Leentjens AF Kahler S Aalten P How do I do that? I