dplyr

Remove duplicate values across a few columns but keep rows

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-19 05:14:58
问题 I have a dataframe that looks like this: dat <- data.frame(id=1:6, z_1=c(100,290,38,129,0,290), z_2=c(20,0,0,0,0,290), z_3=c(0,0,38,0,0,98), z_4=c(0,0,38,127,38,78), z_5=c(23,0,25,0,0,98), z_6=c(100,0,25,127,0,9)) dat id z_1 z_2 z_3 z_4 z_5 z_6 1 1 100 20 0 0 23 100 2 2 290 0 0 0 0 0 3 3 38 0 38 38 25 25 4 4 129 0 0 127 0 127 5 5 0 0 0 38 0 0 6 6 290 290 98 78 98 9 I want to remove duplicate values of z_x across each row, replacing any duplicates with either a 0 or NA , but leaving the rows &

Remove duplicate values across a few columns but keep rows

陌路散爱 提交于 2021-02-19 05:14:05
问题 I have a dataframe that looks like this: dat <- data.frame(id=1:6, z_1=c(100,290,38,129,0,290), z_2=c(20,0,0,0,0,290), z_3=c(0,0,38,0,0,98), z_4=c(0,0,38,127,38,78), z_5=c(23,0,25,0,0,98), z_6=c(100,0,25,127,0,9)) dat id z_1 z_2 z_3 z_4 z_5 z_6 1 1 100 20 0 0 23 100 2 2 290 0 0 0 0 0 3 3 38 0 38 38 25 25 4 4 129 0 0 127 0 127 5 5 0 0 0 38 0 0 6 6 290 290 98 78 98 9 I want to remove duplicate values of z_x across each row, replacing any duplicates with either a 0 or NA , but leaving the rows &

R dplyr. Filter a dataframe that contains a column of numeric vectors

丶灬走出姿态 提交于 2021-02-19 04:22:07
问题 I have a dataframe in which one column contains numeric vectors. I want to filter rows based on a condition involving that column. This is a simplified example. df <- data.frame(id = LETTERS[1:3], name=c("Alice", "Bob", "Carol")) mylist=list(c(1,2,3), c(4,5), c(1,3,4)) df$numvecs <- mylist df # id name numvecs # 1 A Alice 1, 2, 3 # 2 B Bob 4, 5 # 3 C Carol 1, 3, 4 I can use something like mapply e.g. mapply(function(x,y) x=="B" & 4 %in% y, df$id, df$numvecs) which correctly returns TRUE for

Replace NA when last and next non-NA values are equal

非 Y 不嫁゛ 提交于 2021-02-19 02:42:48
问题 I have a sample table with some but not all NA values that need to be replaced. > dat id message index 1 1 <NA> 1 2 1 foo 2 3 1 foo 3 4 1 <NA> 4 5 1 foo 5 6 1 <NA> 6 7 2 <NA> 1 8 2 baz 2 9 2 <NA> 3 10 2 baz 4 11 2 baz 5 12 2 baz 6 13 3 bar 1 14 3 <NA> 2 15 3 <NA> 3 16 3 bar 4 17 3 <NA> 5 18 3 bar 6 19 3 <NA> 7 20 3 qux 8 My objective is to replace the NA values that are surrounded by the same "message" using the first appearance of the message (the least index value) and the last appearance

Combine select and mutate

倖福魔咒の 提交于 2021-02-19 02:19:42
问题 Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns. For example, if I had data about heights and widths but only wanted to use them to calculate and keep the area then I would use: library(dplyr) df <- data.frame(height = 1:3, width = 10:12) df %>% mutate(area = height * width) %>% select(area) When there are a lot

Combine select and mutate

倾然丶 夕夏残阳落幕 提交于 2021-02-19 02:19:38
问题 Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns. For example, if I had data about heights and widths but only wanted to use them to calculate and keep the area then I would use: library(dplyr) df <- data.frame(height = 1:3, width = 10:12) df %>% mutate(area = height * width) %>% select(area) When there are a lot

Combine select and mutate

给你一囗甜甜゛ 提交于 2021-02-19 02:18:41
问题 Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns. For example, if I had data about heights and widths but only wanted to use them to calculate and keep the area then I would use: library(dplyr) df <- data.frame(height = 1:3, width = 10:12) df %>% mutate(area = height * width) %>% select(area) When there are a lot

Combine select and mutate

谁说胖子不能爱 提交于 2021-02-19 02:18:16
问题 Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns. For example, if I had data about heights and widths but only wanted to use them to calculate and keep the area then I would use: library(dplyr) df <- data.frame(height = 1:3, width = 10:12) df %>% mutate(area = height * width) %>% select(area) When there are a lot

Why does dplyr error in this nested if_else, when logical condition means output should not be evaluated?

老子叫甜甜 提交于 2021-02-19 02:18:05
问题 I have a nested if_else statement inside mutate . In my example data frame: tmp_df2 <- data.frame(a = c(1,1,2), b = c(T,F,T), c = c(1,2,3)) a b c 1 1 TRUE 1 2 1 FALSE 2 3 2 TRUE 3 I wish to group by a and then perform operations based on whether a group has one or two rows. I would have thought this nested if_else would suffice: tmp_df2 %>% group_by(a) %>% mutate(tmp_check = n() == 1) %>% mutate(d = if_else(tmp_check, # check for number of entries in group 0, if_else(b, sum(c)/c[b == T], sum

Why does dplyr error in this nested if_else, when logical condition means output should not be evaluated?

杀马特。学长 韩版系。学妹 提交于 2021-02-19 02:17:59
问题 I have a nested if_else statement inside mutate . In my example data frame: tmp_df2 <- data.frame(a = c(1,1,2), b = c(T,F,T), c = c(1,2,3)) a b c 1 1 TRUE 1 2 1 FALSE 2 3 2 TRUE 3 I wish to group by a and then perform operations based on whether a group has one or two rows. I would have thought this nested if_else would suffice: tmp_df2 %>% group_by(a) %>% mutate(tmp_check = n() == 1) %>% mutate(d = if_else(tmp_check, # check for number of entries in group 0, if_else(b, sum(c)/c[b == T], sum