tidyr

tidyr::unite across column patterns

a 夏天 提交于 2019-12-10 04:28:59
问题 I have a dataset that looks something like this site <- c("A", "B", "C", "D", "E") D01_1 <- c(1, 0, 0, 0, 1) D01_2 <- c(1, 1, 0, 1, 1) D02_1 <- c(1, 0, 1, 0, 1) D02_2 <- c(0, 1, 0, 0, 1) D03_1 <- c(1, 1, 0, 0, 0) D03_2 <- c(0, 1, 0, 0, 1) df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2) I am trying to unite the D0x_1 and D0x_2 columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine: library(dplyr) library

Trouble pivoting in pandas (spread in R)

北城余情 提交于 2019-12-10 04:10:23
问题 I'm having some issues with the pd.pivot() or pivot_table() functions in pandas. I have this: df = pd.DataFrame({'site_id': {0: 'a', 1: 'a', 2: 'b', 3: 'b', 4: 'c', 5: 'c',6: 'a', 7: 'a', 8: 'b', 9: 'b', 10: 'c', 11: 'c'}, 'dt': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1,6: 2, 7: 2, 8: 2, 9: 2, 10: 2, 11: 2}, 'eu': {0: 'FGE', 1: 'WSH', 2: 'FGE', 3: 'WSH', 4: 'FGE', 5: 'WSH',6: 'FGE', 7: 'WSH', 8: 'FGE', 9: 'WSH', 10: 'FGE', 11: 'WSH'}, 'kw': {0: '8', 1: '5', 2: '3', 3: '7', 4: '1', 5: '5',6: '2', 7:

R split string at last whitespace chars using tidyr::separate

痴心易碎 提交于 2019-12-10 03:14:24
问题 Suppose I have a dataframe like this: df<-data.frame(a=c("AA","BB"),b=c("short string","this is the longer string")) I would like to split each string using a regex based on the last space occuring. I tried: library(dplyr) library(tidyr) df%>% separate(b,c("partA","partB"),sep=" [^ ]*$") But this omits the second part of the string in the output. My desired output would look like this: a partA partB 1 AA short string 2 BB this is the longer string How do I do this. Would be nice if I could

how to compute rowsums using tidyverse

南笙酒味 提交于 2019-12-09 06:31:51
问题 I did mtcars %>% by_row(sum) but got the message: by_row() is deprecated; please use a combination of: tidyr::nest(); dplyr::mutate(); purrr::map() My naive approach is this mtcars %>% group_by(id = row_number()) %>% nest(-id) %>% mutate(hi = map_dbl(data, sum)) Is there a way to do it without creating an "id" column? 回答1: Is this what you are looking for? mtcars %>% mutate(rowsum = rowSums(.)) Output: mpg cyl disp hp drat wt qsec vs am gear carb rowsum 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1

R: spread function on data frame with duplicates

Deadly 提交于 2019-12-09 03:43:01
问题 I have a data frame that I need to pivot but the data frame has duplicate identifiers, so spread function gives an error Error: Duplicate identifiers for rows (5, 6) Dimension = c("A","A","B","B","A","A") Date = c("Mon","Tue","Mon","Wed","Fri","Fri") Metric = c(23,25,7,9,7,8) df = data.frame(Dimension,Date,Metric) df Dimension Date Metric 1 A Mon 23 2 A Tue 25 3 B Mon 7 4 B Wed 9 5 A Fri 7 6 A Fri 8 library(tidyr) df1 = spread(df, Date, Metric, fill = " ") Error: Duplicate identifiers for

Sparklyr: how to explode a list column into their own columns in Spark table?

那年仲夏 提交于 2019-12-09 01:38:19
问题 My question is similar with the one in here, but I'm having problems implementing the answer, and I cannot comment in that thread. So, I have a big CSV file that contains a nested data, which contains 2 columns separated by whitespace (say first column is Y, second column is X). Column X itself is also a comma-separated value. 21.66 2.643227,1.2698358,2.6338573,1.8812188,3.8708665,... 35.15 3.422151,-0.59515584,2.4994135,-0.19701914,4.0771823,... 15.22 2.8302398,1.9080592,-0.68780196,3

reshape a dataframe with tidyr or reshape2 [duplicate]

不想你离开。 提交于 2019-12-08 11:21:57
问题 This question already has answers here : Reshaping multiple sets of measurement columns (wide format) into single columns (long format) (7 answers) Closed 3 years ago . I would like to transforme this dataset : ID v1 v2 v3 c1 c2 c3 1 1 -3 -11 -2 -6 -1 -1 2 2 -10 -4 -12 -11 4 6 3 3 4 -4 15 5 1 -3 4 4 -6 0 -6 5 -1 8 5 5 -7 12 6 -12 -11 11 input<-structure(list(ID = 1:5, v1 = c(-3, -10, 4, -6, -7), v2 = c(-11, -4, -4, 0, 12), v3 = c(-2, -12, 15, -6, 6), c1 = c(-6, -11, 5, 5, -12), c2 = c(-1, 4,

Separate a String using Tidyr's “separate” into Multiple Columns and then Create a New Column with Counts

十年热恋 提交于 2019-12-08 10:06:55
问题 So I have the basic dataframe below which contains long strings separated by a comma.I used Tidyr's "separate" to create new columns. How do I add another new column with counts of how many new columns there are for each person that contain an answer? (no NA's). I suppose the columns can be counted after being separated, or before, by counting how many string elements there are that are separated by a comma? Any help would be appreciated. I would like to stay within the Tidyverse and dplyr.

how to covert character within each column as sub-column without duplication

﹥>﹥吖頭↗ 提交于 2019-12-08 06:40:54
问题 I have a data.frame file like this: input: 1 200 444 444 2 310 NA 444 3 310 NA 444 4 NA 444 444 5 200 444 444 6 200 NA 444 7 310 444 444 8 310 876 444 9 310 876 444 10 NA 876 444 I want to convert ecah character within each column as a sub-column and I want to put either 1 or zero in rows in the way that they represent if the the sub column was observed in that specific row or not: Output data.frame : c1.200 c1.310 c2.444 c2.876 c3.444 1 1 0 1 0 1 2 0 1 0 0 1 3 0 1 0 0 1 4 0 0 1 0 1 5 1 0 1 0

Can spread() in tidyr spread across multiple value?

妖精的绣舞 提交于 2019-12-08 06:09:48
问题 I am using the iris data set, first, I did some manipulation with that data set and make it into the following form D1 = iris[,c(1,2,5)] D2 = iris[,c(3,4,5)] colnames(D1)[1:2] = c('Length','Width') colnames(D2)[1:2] = c('Length','Width') D1 = D1 %>% mutate(Part = 'Sepal') D2 = D2 %>% mutate(Part = 'Petal') D = rbind(D2,D1) which looks like Species Part Length Width 1 setosa Petal 1.4 0.2 2 setosa Petal 1.4 0.2 3 setosa Petal 1.3 0.2 4 setosa Petal 1.5 0.2 5 setosa Petal 1.4 0.2 6 setosa Petal