tidyr

Changing Million/Billion abbreviations into actual numbers? ie. 5.12M -> 5,120,000 [duplicate]

删除回忆录丶 提交于 2019-12-17 14:53:57
问题 This question already has answers here : Convert from billion to million and vice versa (6 answers) Closed 2 years ago . As the title suggests I'm looking for a way to transform short hand abbreviated 'character' text to numerical data. For example I'd like to make these changes within my dataframe: 84.06M -> 84,060,000 30.12B -> 30,120,000,000 9.78B -> 9,780,000,000 251.29M -> 251,29,000 Here's an example of some of the dataframe I'm working with: Index Market Cap Income Sales Book/sh ZX -

Spreading a two column data frame with tidyr

折月煮酒 提交于 2019-12-17 10:01:58
问题 I have a data frame that looks like this: a b 1 x 8 2 x 6 3 y 3 4 y 4 5 z 5 6 z 6 and I want to turn it into this: x y z 1 8 3 5 2 6 4 6 But calling library(tidyr) df <- data.frame( a = c("x", "x", "y", "y", "z", "z"), b = c(8, 6, 3, 4, 5, 6) ) df %>% spread(a, b) returns x y z 1 8 NA NA 2 6 NA NA 3 NA 3 NA 4 NA 4 NA 5 NA NA 5 6 NA NA 6 What am I doing wrong? 回答1: While I'm aware you're after tidyr , base has a solution in this case: unstack(df, b~a) It's also a little bit faster: Unit:

Proper idiom for adding zero count rows in tidyr/dplyr

╄→гoц情女王★ 提交于 2019-12-17 07:16:10
问题 Suppose I have some count data that looks like this: library(tidyr) library(dplyr) X.raw <- data.frame( x = as.factor(c("A", "A", "A", "B", "B", "B")), y = as.factor(c("i", "ii", "ii", "i", "i", "i")), z = 1:6) X.raw # x y z # 1 A i 1 # 2 A ii 2 # 3 A ii 3 # 4 B i 4 # 5 B i 5 # 6 B i 6 I'd like to tidy and summarise like this: X.tidy <- X.raw %>% group_by(x,y) %>% summarise(count=sum(z)) X.tidy # Source: local data frame [3 x 3] # Groups: x # # x y count # 1 A i 1 # 2 A ii 5 # 3 B i 15 I know

Using spread with duplicate identifiers for rows

假装没事ソ 提交于 2019-12-17 02:38:07
问题 I have a long form dataframe that have multiple entries for same date and person. jj <- data.frame(month=rep(1:3,4), student=rep(c("Amy", "Bob"), each=6), A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5)) I want to convert it to wide form and make it like this: month Amy.A Bob.A Amy.B Bob.B 1 2 3 1 2 3 1 2 3 1 2 3 My question is very similar to this. I have used the given code in the answer : kk <- jj %>% gather(variable, value, -(month:student)) %>% unite(temp

counting values after and before change in value, within groups, generating new variables for each unique shift

假装没事ソ 提交于 2019-12-14 03:39:27
问题 I am looking for a way to, within id groups, count unique occurrences of value shifts in TF in the data data tbl . I want to count both forward and backwards from when TF changes between 1 and 0 or o and 1 . The counting is to be stored in a new variable PM## , so that the PM## s holds each unique shift in TF , in both plus and minus. The MWE below leads to an outcome with 7 PM, but my production data can have 15 or more shifts. If a TF values does not change between NA 's I want to mark it 0

How can I spread a data frame (from long to wide) and preserve two fields' data?

本小妞迷上赌 提交于 2019-12-14 03:25:09
问题 I have a data frame: df <- structure(list(date = structure(c(17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17563, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17564, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565, 17565,

Combination of purrr::map and dplyr give inconsistent result with a plain statistical test

一曲冷凌霜 提交于 2019-12-14 02:08:53
问题 I'm comparing two vectors (data_A_score, data_B_score) with another vector K1, using ks.test(), which I get this result: score_ref_k1 <- c(0.09651, 0.09543, 0.09122, 0.09458, 0.09382, 0.10158, 0.10339, 0.13594, 0.09458, 0.09296) data_A_score_src <- c(0.09293, 0.09838, 0.09866, 0.10866, 0.09726, 0.10731, 0.09866, 0.09398, 0.10007, 0.10408) data_B_score_src <- c(0.04741, 0.0621, 0.09606, 0.08851, 0.05063, 0.39775, 0.05509, 0.10784, 0.0468, 0.04782) ks.test(data_A_score_src, score_ref_k1, exact

Reshaping data in R with “login” “logout” times

末鹿安然 提交于 2019-12-14 00:16:32
问题 I'm new to R, and am working on a side project for my own purposes. I have this data (reproducable dput of this is at the end of the question): X datetime user state 1 1 2016-02-19 19:13:26 User1 joined 2 2 2016-02-19 19:21:18 User2 joined 3 3 2016-02-19 19:21:33 User1 joined 4 4 2016-02-19 19:35:38 User1 joined 5 5 2016-02-19 19:44:15 User1 joined 6 6 2016-02-19 19:48:55 User1 joined 7 7 2016-02-19 19:52:40 User1 joined 8 8 2016-02-19 19:53:15 User3 joined 9 9 2016-02-19 20:02:34 User3

formatting multi-row data into single row in R

我只是一个虾纸丫 提交于 2019-12-13 20:15:51
问题 I am a strange excel or csv formatted file which I want to import to R as a data frame. The problem is that some columns have multiple rows for the records, for example, the data is as follow: There are three columns and two rows but the tools columns has multiple columns, is there a way I can format the data so I will have only record with multiple tools (like say tool1, tool2, etc) Task Location Tools Raising ticket Alabama sharepoint word oracle Changing ticket Seattle word oracle Final

Transpose dplyr::tbl object

99封情书 提交于 2019-12-13 17:52:32
问题 I am using src_postgres to connect and dplyr::tbl function to fetch data from redshift database. I have applied some filters and top function to it using the dplyr itself. Now my data looks as below: riid day hour <dbl> <chr> <chr> 1 5542. "THURSDAY " 12 2 5862. "FRIDAY " 15 3 5982. "TUESDAY " 15 4 6022. WEDNESDAY 16 My final output should be as below: riid MON TUES WED THUR FRI SAT SUN 5542 12 5862 15 5988 15 6022 16 I have tried spread. It throws the below error because of the class type: