tidyr | 易学教程

Tidyr how to spread into count of occurrence [duplicate]

阅读更多关于 Tidyr how to spread into count of occurrence [duplicate]

This question already has an answer here: How do I get a contingency table? 6 answers Faster ways to calculate frequencies and cast from long to wide 4 answers Have a data frame like this other=data.frame(name=c("a","b","a","c","d"),result=c("Y","N","Y","Y","N")) How can I use spread function in tidyr or other function to get the count of result Y or N as column header like this name Y N a 2 0 b 0 1 Thanks These are a few ways of many to go about it: 1) With library dplyr , you can simply group things and count into the format needed: library(dplyr) other %>% group_by(name) %>% summarise(N =

tidyr separate only first n instances [duplicate]

阅读更多关于 tidyr separate only first n instances [duplicate]

问题 This question already has an answer here: How to strsplit different number of strings in certain column by do function 1 answer I have a data.frame in R, which, for simplicity, has one column that I want to separate. It looks like this: V1 Value_is_the_best_one This_is_the_prettiest_thing_I've_ever_seen Here_is_the_next_example_of_what_I_want My real data is very large (millions of rows), so I'd like to use tidyr's separate function (because it's amazingly fast) to separate out JUST the first

data.table equivalent of tidyr::complete()

阅读更多关于 data.table equivalent of tidyr::complete()

tidyr::complete() adds rows to a data.frame for combinations of column values that are missing from the data. Example: library(dplyr) library(tidyr) df <- data.frame(person = c(1,2,2), observation_id = c(1,1,2), value = c(1,1,1)) df %>% tidyr::complete(person, observation_id, fill = list(value=0)) yields # A tibble: 4 × 3 person observation_id value <dbl> <dbl> <dbl> 1 1 1 1 2 1 2 0 3 2 1 1 4 2 2 1 where the value of the combination person == 1 and observation_id == 2 that is missing in df has been filled in with a value of 0. What would be the equivalent of this in data.table ? I reckon that

Transposing data frames

阅读更多关于 Transposing data frames

Happy Weekends. I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t , preferably using tidyr or reshape . In example below, metadata is obtained by transposing data . metadata <- data.frame(colnames(data), t(data[1:4, ]) ) colnames(metadata) <- t(metadata[1,]) metadata <- metadata[-1,] metadata$Multiplier <- as.numeric(metadata$Multiplier) Though it achieves what I want, I find it little unskillful. Is there any efficient workflow to transpose the data frame? dput of data data <- structure(list(Series.Description

Comparing gather (tidyr) to melt (reshape2)

阅读更多关于 Comparing gather (tidyr) to melt (reshape2)

I love the reshape2 package because it made life so doggone easy. Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code. I figured I'd give tidyr a whirl and from what I read I thought gather was very similar to melt from reshape2 . But after reading the documentation I can't get gather to do the same task that melt does. Data View Here's a view of the data (actual data in dput form at end of post): teacher yr1.baseline pd yr1.lesson1 yr1.lesson2 yr2.lesson1 yr2.lesson2 yr2.lesson3 1 3 1/13/09 2/5/09 3/6/09 4/27/09 10/7/09 11/18/09 3/4/10

Split or separate uneven/unequal strings with no delimiter

阅读更多关于 Split or separate uneven/unequal strings with no delimiter

Given the dataframe df : x <- c("X1", "X2", "X3", "X4", "X5") y <- c("00L0", "0", "00012L", "0123L0", "0D0") df <- data.frame(x, y) How can I leverage tidyr::separate to put each character of the y strings into a separate column (one column per string position)? Desired output: x <- c("X1", "X2", "X3", "X4", "X5") m1 <- c(0, 0, 0, 0, 0) m2 <- c(0, NA, 0, 1, "D") m3 <- c("L", NA, 0, 2, 0) mN <- c(NA, NA, NA, NA, NA) df <- data.frame(x, m1, m2, m3, mN) Where mN could theoretically go up to m100 (100 columns), or higher. This works. It fills with blanks rather than NA s, but you can change that

Changing Million/Billion abbreviations into actual numbers? ie. 5.12M -> 5,120,000 [duplicate]

阅读更多关于 Changing Million/Billion abbreviations into actual numbers? ie. 5.12M -> 5,120,000 [duplicate]

This question already has an answer here: Convert from billion to million and vice versa 6 answers As the title suggests I'm looking for a way to transform short hand abbreviated 'character' text to numerical data. For example I'd like to make these changes within my dataframe: 84.06M -> 84,060,000 30.12B -> 30,120,000,000 9.78B -> 9,780,000,000 251.29M -> 251,29,000 Here's an example of some of the dataframe I'm working with: Index Market Cap Income Sales Book/sh ZX - 84.06M -1.50M 359.50M 7.42 ZTS S&P 500 30.13B 878.00M 5.02B 3.49 ZTR - - - - - ZTO - 9.78B 288.30M 1.47B 4.28 ZPIN - 1.02B 27

adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

阅读更多关于 adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

Short version How to do the operation df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two) in a more natural way? Long version Given a data frame df1 <- data.frame(groupid = c("one","one","one","two","two","two", "one"), value = c(3,2,1,2,3,1,22), itemid = c(1:6, 6)) for many itemid and groupid pairs we have a value, for some itemids there are groupids where there is no value. I want to add a default value for those cases. E.g. for the itemid 1 and groupid "two" there is no value, I want to add a row where this gets a default value. The following tidyr code achieves

Reshape Data Long to Wide - understanding reshape parameters

阅读更多关于 Reshape Data Long to Wide - understanding reshape parameters

I have a long format dataframe dogs that I'm trying to reformat to wide using the reshape() function. It currently looks like so: dogid month year trainingtype home school timeincomp 12345 1 2014 1 1 1 340 12345 2 2014 1 1 1 360 31323 12 2015 2 7 3 440 31323 1 2014 1 7 3 500 31323 2 2014 1 7 3 520 The dogid column is a bunch of ids, one for each dog. The month column varies for 1 to 12 for the 12 months, and year from 2014 to 2015. Trainingtype varies for 1 to 2. Each dog has a timeincomp value for every month-year-trainingtype combination, so 48 entries per dog. Home and school vary from 1-8

Removing NA observations with dplyr::filter()

阅读更多关于 Removing NA observations with dplyr::filter()

My data looks like this: library(tidyverse) df <- tribble( ~a, ~b, ~c, 1, 2, 3, 1, NA, 3, NA, 2, 3 ) I can remove all NA observations with drop_na() : df %>% drop_na() Or remove all NA observations in a single column ( a for example): df %>% drop_na(a) Why can't I just use a regular != filter pipe? df %>% filter(a != NA) Why do we have to use a special function from tidyr to remove NAs? JeffZheng For example: you can use: df %>% filter(!is.na(a)) to remove the NA in column a. emehex From @Ben Bolker: [T]his has nothing specifically to do with dplyr::filter() From @Marat Talipov: [A]ny