tidyr | 易学教程

How to ungroup list columns in data.table?

阅读更多关于 How to ungroup list columns in data.table?

问题 tidyr provides the unnest function that help expanding list columns. This is similar to the much (20x) faster ungroup function in kdb. I am looking for a similar (but much faster) function that, assuming a data.table that contains several list columns, each with the same number of element on each row, would expand the data.table. This an extension of this post. library(data.table) library(tidyr) t = Sys.time() DT = data.table(a=c(1,2,3), b=c('q','w','e'), c=list(rep(t,2),rep(t+1,3),rep(t,0)),

Reorganizing dataframe with multiple header types following “tidy” approach in R

阅读更多关于 Reorganizing dataframe with multiple header types following “tidy” approach in R

问题 I have a dataframe that looks like somewhat like this: Age A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq Comments 23 1 2 1 NA NA NA Good 54 NA NA NA 4 1 2 ABCD 43 2 4 7 NA NA NA HiHi I am trying to reorganize it in way shown below to make it more "tidy". Is there a way for me to do this that also incorporates the Age and Comments columns in the same style as shown for the other variables below? How would you suggest incorporating them - one idea is shown below, but I am open to other

In R: tidyr split and swing value into column name using regex

阅读更多关于 In R: tidyr split and swing value into column name using regex

问题 Im trying to get customized with the tidyr package, and am strugling with the problem of having a variable which is a concatenate of several variables. In the minimal example below, I would like to split variable v2 into its constituent variables v3 and v4 and then swing these so I end up with the four variables v1 - v4 . require(plyr) require(dplyr) require(stringr) require(tidyr) data <- data.frame( v1=c(1,2), v2=c("v3 cheese; v4 200", "v3 ham; v4 150")) %>% tbl_df() If I split v2 into a

Creating a Similarity Matrix from Raw Card-Sort Data

阅读更多关于 Creating a Similarity Matrix from Raw Card-Sort Data

问题 I have a data set from an online card sorting activity. Participants were presented with a random subset of Cards (from a larger set) and asked to create Groups of Cards they felt were similar to one another. Participants were able to create as many Groups as they liked and name the Groups whatever they wanted. An example data set is something like this: Data <- structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,

Finding maximum values of a score for a given subject over multiple timepoints

阅读更多关于 Finding maximum values of a score for a given subject over multiple timepoints

问题 To start, here's example data which I'm working with: ID BaselineScore MidScore Final Score 1 x NA NA 1 NA y NA 1 NA NA z 2 a NA NA 2 NA b NA 2 NA NA c What I'd like to accomplish is for a given ID (ID==1,ID==2, etc.), determine which of the three scores (baseline, mid, or final) is greatest (i.e. max(x,y,z), max(a,b,c), etc.). The reason I have NAs is because I used the spread function from tidyr (the score variables at a certain time point were originally rows under a more general score

summarizing data in cross-table with grouped_by variable in columns

阅读更多关于 summarizing data in cross-table with grouped_by variable in columns

问题 I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data: dat1 <- data.frame( category = rep(c("catA", "catB", "catC"), each=4), age = sample(1:2,size=4,replace=T), value = rnorm(12)

R: DPLYR package: bind_rows failing when calls a custom function

阅读更多关于 R: DPLYR package: bind_rows failing when calls a custom function

问题 Using DPLYR and TIDYR, I'm trying to create a tidy version of a dataset where rows can be missing depending on the data of certain columns. I created a function that returns the rows missing (by creating them with default data) in a new tbl_df(data.frame) (I unit-tested it and it works okay with specific data). However, when calling it from 'bind_rows', I get the following error: Error in data.frame(a, b, c,...: Object 'A' not found. For example, my data looks like this: A B C D E ... a1 b1

How to take minimum value out of several ask quotes and maximum value out of several bid quotes in two columns from a single column?

阅读更多关于 How to take minimum value out of several ask quotes and maximum value out of several bid quotes in two columns from a single column?

问题 I have a data set containing bid and ask quotes for 3 days and a stock. Following is the portion of the dataset. I have also given a link to the sample data set to illustrate the pecularity of the matter. > dput(head(q,30)) structure(list(Date = structure(c(1471424400, 1471424400, 1471424400, 1471424401, 1471424401, 1471424406, 1471424407, 1471424415, 1471424417, 1471424514, 1471424527, 1471424567, 1471424576, 1471424606, 1471424607, 1471424621, 1471424621, 1471424621, 1471424641, 1471424642,

Regular expression on separate function of Tidyr

阅读更多关于 Regular expression on separate function of Tidyr

问题 I need separate two columns with tidyr. The column have text like: I am Sam . I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...] . The problem is I need split it in two columns: Column one I am , and column two: Sam . I can't find a regular expression two separate with the second blank space. Could you help me please? 回答1: We can use extract from tidyr . We match one or more characters and place it in a capture group ( (.*) )

Find lowest date in each group [duplicate]

阅读更多关于 Find lowest date in each group [duplicate]

问题 This question already has answers here : Subset data based on Minimum Value (2 answers) Closed 3 years ago . Hi there: I'm trying to find the lowest date in each group. The purpose is to find what date is common to each of several time series. Currently the data look like this. library(tidyr) library(dplyr) grouping_variable<-sample(c('a', 'b', 'c'), 500, replace=TRUE) date<-sample(seq(as.Date('1999/01/01'), as.Date('2015/01/01'), by="day"), 500) numeric_variable<-rnorm(500, 50, sd=2) df<