dplyr

How to filter rows with multiple conditions

余生长醉 提交于 2021-02-10 14:29:41
问题 I am new to R. I'm trying to filter rows from a data.frame (df) based on multiple conditions: An example of my data.frame: image of my df df: SNPA SNPB value block1 block2 score_T A1 A22 0.379927 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 12 A2 A23 0.449074 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 25 A3 A24 0.464135 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 584 A4 A22 0.328866 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 51 A5 A22 0.326026 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 64 A22 A27 0.57169 A22|A23|A24|A25 A27|A28|A29|A30|A31

R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

微笑、不失礼 提交于 2021-02-10 14:24:06
问题 I have extremely messy data. A portion of it looks like the following example. x1_01=c("bearing_coordinates", "bearing_coordinates", "bearing_coordinates", "roadkill") x1_02=c(146,122,68,1) x2_01=c("tree_density","animals_on_road","animals_on_road", "tree_density") x2_02=c(13,2,5,11) x3_01=c("animals_on_road", "tree_density", "roadkill", "bearing_coordinates") x3_02=c(3,10,1,1000) x4_01=c("roadkill","roadkill", "tree_density", "animals_on_road") x4_02=c(1,1,12,6) testframe = data.frame(x1_01

Unused argument in summarise n() R

廉价感情. 提交于 2021-02-10 14:14:58
问题 I am trying to run the following code: DF2 %>% group_by(doy, yearadded) %>% summarise(n_entries= n(doy, yearadded)) Which gives me the error: Error in n(doy, yearadded) : unused arguments (doy, yearadded) My yearadded field is a character class and doy is a numeric, is that why it's not working or is there some other reason? 回答1: The n() doesn't take any arguments. It would be library(dplyr) DF2 %>% group_by(doy, yearadded) %>% summarise(n_entries= n()) Or more compactly count(DF2, doy,

Filter one column by matching to another column

雨燕双飞 提交于 2021-02-10 12:43:08
问题 I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below: df <- data.frame(pair = c(1, 1, 2, 2, 3, 3), animal = rep(c("dog", "cat"), 3), value = seq(1, 12, 2), drop = c("no", "no", "dog", "dog", "cat", "cat")) pair animal value drop 1 1 dog 1 no 2 1 cat 3 no 3 2 dog 5 dog 4 2 cat 7 dog 5 3 dog 9 cat 6 3 cat 11 cat I'm trying to want to filter the data frame according to whether the value of animal matches the

Filter one column by matching to another column

天大地大妈咪最大 提交于 2021-02-10 12:40:50
问题 I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below: df <- data.frame(pair = c(1, 1, 2, 2, 3, 3), animal = rep(c("dog", "cat"), 3), value = seq(1, 12, 2), drop = c("no", "no", "dog", "dog", "cat", "cat")) pair animal value drop 1 1 dog 1 no 2 1 cat 3 no 3 2 dog 5 dog 4 2 cat 7 dog 5 3 dog 9 cat 6 3 cat 11 cat I'm trying to want to filter the data frame according to whether the value of animal matches the

Adding name of file when using sparklyr::spark_read_json

谁说我不能喝 提交于 2021-02-10 06:14:30
问题 I have millions of json-files, where each of the files contains the same number of columns, lets say x and y . Note that the length of x and y is equal for a single file, but could be different when comparing two different files. The problem is that the only thing that separates the data is the name of the file. So when combining the files I'd like to have the name of the file included as a third column. Is this possible using sparklyr::spark_read_json , i.e. when using wildcards? MWE:

How to filter rows for every column independently using dplyr

丶灬走出姿态 提交于 2021-02-10 05:51:50
问题 I have the following tibble: library(tidyverse) df <- tibble::tribble( ~gene, ~colB, ~colC, "a", 1, 2, "b", 2, 3, "c", 3, 4, "d", 1, 1 ) df #> # A tibble: 4 x 3 #> gene colB colC #> <chr> <dbl> <dbl> #> 1 a 1 2 #> 2 b 2 3 #> 3 c 3 4 #> 4 d 1 1 What I want to do is to filter every columns after gene column for values greater or equal 2 (>=2). Resulting in this: gene, colB, colC a NA 2 b 2 3 c 3 4 How can I achieve that? The number of columns after genes actually is more than just 2. 回答1: One

How to filter rows for every column independently using dplyr

半腔热情 提交于 2021-02-10 05:50:06
问题 I have the following tibble: library(tidyverse) df <- tibble::tribble( ~gene, ~colB, ~colC, "a", 1, 2, "b", 2, 3, "c", 3, 4, "d", 1, 1 ) df #> # A tibble: 4 x 3 #> gene colB colC #> <chr> <dbl> <dbl> #> 1 a 1 2 #> 2 b 2 3 #> 3 c 3 4 #> 4 d 1 1 What I want to do is to filter every columns after gene column for values greater or equal 2 (>=2). Resulting in this: gene, colB, colC a NA 2 b 2 3 c 3 4 How can I achieve that? The number of columns after genes actually is more than just 2. 回答1: One

How to subset data by filtering and grouping efficiently in R

僤鯓⒐⒋嵵緔 提交于 2021-02-10 05:09:26
问题 I'm working on a project and am looking for some help to make my code run more efficiently. I've searched for similar problems but can't seem to find anything quite as granular as this one. The solution I've come up with is extremely clunky. I'm confident that there must be a more efficient way to do this with a package like dplyr , data.tables , etc. Problem: I have 3 columns of data, 'ids' , 'x.group' , and 'times' . I need to extract the first 3 unique 'ids' that appear in each 'times'

How to combine two data frames using dplyr or other packages?

拈花ヽ惹草 提交于 2021-02-09 15:45:48
问题 I have two data frames: df1 = data.frame(index=c(0,3,4),n1=c(1,2,3)) df1 # index n1 # 1 0 1 # 2 3 2 # 3 4 3 df2 = data.frame(index=c(1,2,3),n2=c(4,5,6)) df2 # index n2 # 1 1 4 # 2 2 5 # 3 3 6 I want to join these to: index n 1 0 1 2 1 4 3 2 5 4 3 8 (index 3 in two df, so add 2 and 6 in each df) 5 4 3 6 5 0 (index 5 not exists in either df, so set 0) 7 6 0 (index 6 not exists in either df, so set 0) The given data frames are just part of large dataset. Can I do it using dplyr or other packages