tidyr | 易学教程

Changing Million/Billion abbreviations into actual numbers? ie. 5.12M -> 5,120,000 [duplicate]

阅读更多关于 Changing Million/Billion abbreviations into actual numbers? ie. 5.12M -> 5,120,000 [duplicate]

问题 This question already has answers here : Convert from billion to million and vice versa (6 answers) Closed 2 years ago . As the title suggests I'm looking for a way to transform short hand abbreviated 'character' text to numerical data. For example I'd like to make these changes within my dataframe: 84.06M -> 84,060,000 30.12B -> 30,120,000,000 9.78B -> 9,780,000,000 251.29M -> 251,29,000 Here's an example of some of the dataframe I'm working with: Index Market Cap Income Sales Book/sh ZX -

Spreading a two column data frame with tidyr

阅读更多关于 Spreading a two column data frame with tidyr

问题 I have a data frame that looks like this: a b 1 x 8 2 x 6 3 y 3 4 y 4 5 z 5 6 z 6 and I want to turn it into this: x y z 1 8 3 5 2 6 4 6 But calling library(tidyr) df <- data.frame( a = c("x", "x", "y", "y", "z", "z"), b = c(8, 6, 3, 4, 5, 6) ) df %>% spread(a, b) returns x y z 1 8 NA NA 2 6 NA NA 3 NA 3 NA 4 NA 4 NA 5 NA NA 5 6 NA NA 6 What am I doing wrong? 回答1: While I'm aware you're after tidyr , base has a solution in this case: unstack(df, b~a) It's also a little bit faster: Unit:

Proper idiom for adding zero count rows in tidyr/dplyr

阅读更多关于 Proper idiom for adding zero count rows in tidyr/dplyr

问题 Suppose I have some count data that looks like this: library(tidyr) library(dplyr) X.raw <- data.frame( x = as.factor(c("A", "A", "A", "B", "B", "B")), y = as.factor(c("i", "ii", "ii", "i", "i", "i")), z = 1:6) X.raw # x y z # 1 A i 1 # 2 A ii 2 # 3 A ii 3 # 4 B i 4 # 5 B i 5 # 6 B i 6 I'd like to tidy and summarise like this: X.tidy <- X.raw %>% group_by(x,y) %>% summarise(count=sum(z)) X.tidy # Source: local data frame [3 x 3] # Groups: x # # x y count # 1 A i 1 # 2 A ii 5 # 3 B i 15 I know

Using spread with duplicate identifiers for rows

阅读更多关于 Using spread with duplicate identifiers for rows

问题 I have a long form dataframe that have multiple entries for same date and person. jj <- data.frame(month=rep(1:3,4), student=rep(c("Amy", "Bob"), each=6), A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5)) I want to convert it to wide form and make it like this: month Amy.A Bob.A Amy.B Bob.B 1 2 3 1 2 3 1 2 3 1 2 3 My question is very similar to this. I have used the given code in the answer : kk <- jj %>% gather(variable, value, -(month:student)) %>% unite(temp

counting values after and before change in value, within groups, generating new variables for each unique shift

阅读更多关于 counting values after and before change in value, within groups, generating new variables for each unique shift

问题 I am looking for a way to, within id groups, count unique occurrences of value shifts in TF in the data data tbl . I want to count both forward and backwards from when TF changes between 1 and 0 or o and 1 . The counting is to be stored in a new variable PM## , so that the PM## s holds each unique shift in TF , in both plus and minus. The MWE below leads to an outcome with 7 PM, but my production data can have 15 or more shifts. If a TF values does not change between NA 's I want to mark it 0

How can I spread a data frame (from long to wide) and preserve two fields' data?

阅读更多关于 How can I spread a data frame (from long to wide) and preserve two fields' data?

问题 I have a data frame: df <- structure(list(date = structure(c(17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565,

Combination of purrr::map and dplyr give inconsistent result with a plain statistical test

阅读更多关于 Combination of purrr::map and dplyr give inconsistent result with a plain statistical test

问题 I'm comparing two vectors (data_A_score, data_B_score) with another vector K1, using ks.test(), which I get this result: score_ref_k1 <- c(0.09651, 0.09543, 0.09122, 0.09458, 0.09382, 0.10158, 0.10339, 0.13594, 0.09458, 0.09296) data_A_score_src <- c(0.09293, 0.09838, 0.09866, 0.10866, 0.09726, 0.10731, 0.09866, 0.09398, 0.10007, 0.10408) data_B_score_src <- c(0.04741, 0.0621, 0.09606, 0.08851, 0.05063, 0.39775, 0.05509, 0.10784, 0.0468, 0.04782) ks.test(data_A_score_src, score_ref_k1, exact

Reshaping data in R with “login” “logout” times

阅读更多关于 Reshaping data in R with “login” “logout” times

问题 I'm new to R, and am working on a side project for my own purposes. I have this data (reproducable dput of this is at the end of the question): X datetime user state 1 1 2016-02-19 19:13:26 User1 joined 2 2 2016-02-19 19:21:18 User2 joined 3 3 2016-02-19 19:21:33 User1 joined 4 4 2016-02-19 19:35:38 User1 joined 5 5 2016-02-19 19:44:15 User1 joined 6 6 2016-02-19 19:48:55 User1 joined 7 7 2016-02-19 19:52:40 User1 joined 8 8 2016-02-19 19:53:15 User3 joined 9 9 2016-02-19 20:02:34 User3

formatting multi-row data into single row in R

阅读更多关于 formatting multi-row data into single row in R

问题 I am a strange excel or csv formatted file which I want to import to R as a data frame. The problem is that some columns have multiple rows for the records, for example, the data is as follow: There are three columns and two rows but the tools columns has multiple columns, is there a way I can format the data so I will have only record with multiple tools (like say tool1, tool2, etc) Task Location Tools Raising ticket Alabama sharepoint word oracle Changing ticket Seattle word oracle Final

Transpose dplyr::tbl object

阅读更多关于 Transpose dplyr::tbl object

问题 I am using src_postgres to connect and dplyr::tbl function to fetch data from redshift database. I have applied some filters and top function to it using the dplyr itself. Now my data looks as below: riid day hour <dbl> <chr> <chr> 1 5542. "THURSDAY " 12 2 5862. "FRIDAY " 15 3 5982. "TUESDAY " 15 4 6022. WEDNESDAY 16 My final output should be as below: riid MON TUES WED THUR FRI SAT SUN 5542 12 5862 15 5988 15 6022 16 I have tried spread. It throws the below error because of the class type: