dplyr | 易学教程

How to filter rows with multiple conditions

阅读更多关于 How to filter rows with multiple conditions

问题 I am new to R. I'm trying to filter rows from a data.frame (df) based on multiple conditions: An example of my data.frame: image of my df df: SNPA SNPB value block1 block2 score_T A1 A22 0.379927 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 12 A2 A23 0.449074 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 25 A3 A24 0.464135 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 584 A4 A22 0.328866 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 51 A5 A22 0.326026 A1|A2|A3|A4|A5|A6 A22|A23|A24|A25 64 A22 A27 0.57169 A22|A23|A24|A25 A27|A28|A29|A30|A31

R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

阅读更多关于 R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

问题 I have extremely messy data. A portion of it looks like the following example. x1_01=c("bearing_coordinates", "bearing_coordinates", "bearing_coordinates", "roadkill") x1_02=c(146,122,68,1) x2_01=c("tree_density","animals_on_road","animals_on_road", "tree_density") x2_02=c(13,2,5,11) x3_01=c("animals_on_road", "tree_density", "roadkill", "bearing_coordinates") x3_02=c(3,10,1,1000) x4_01=c("roadkill","roadkill", "tree_density", "animals_on_road") x4_02=c(1,1,12,6) testframe = data.frame(x1_01

Unused argument in summarise n() R

阅读更多关于 Unused argument in summarise n() R

问题 I am trying to run the following code: DF2 %>% group_by(doy, yearadded) %>% summarise(n_entries= n(doy, yearadded)) Which gives me the error: Error in n(doy, yearadded) : unused arguments (doy, yearadded) My yearadded field is a character class and doy is a numeric, is that why it's not working or is there some other reason? 回答1: The n() doesn't take any arguments. It would be library(dplyr) DF2 %>% group_by(doy, yearadded) %>% summarise(n_entries= n()) Or more compactly count(DF2, doy,

Filter one column by matching to another column

阅读更多关于 Filter one column by matching to another column

问题 I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below: df <- data.frame(pair = c(1, 1, 2, 2, 3, 3), animal = rep(c("dog", "cat"), 3), value = seq(1, 12, 2), drop = c("no", "no", "dog", "dog", "cat", "cat")) pair animal value drop 1 1 dog 1 no 2 1 cat 3 no 3 2 dog 5 dog 4 2 cat 7 dog 5 3 dog 9 cat 6 3 cat 11 cat I'm trying to want to filter the data frame according to whether the value of animal matches the

Filter one column by matching to another column

阅读更多关于 Filter one column by matching to another column

Adding name of file when using sparklyr::spark_read_json

阅读更多关于 Adding name of file when using sparklyr::spark_read_json

问题 I have millions of json-files, where each of the files contains the same number of columns, lets say x and y . Note that the length of x and y is equal for a single file, but could be different when comparing two different files. The problem is that the only thing that separates the data is the name of the file. So when combining the files I'd like to have the name of the file included as a third column. Is this possible using sparklyr::spark_read_json , i.e. when using wildcards? MWE:

How to filter rows for every column independently using dplyr

阅读更多关于 How to filter rows for every column independently using dplyr

问题 I have the following tibble: library(tidyverse) df <- tibble::tribble( ~gene, ~colB, ~colC, "a", 1, 2, "b", 2, 3, "c", 3, 4, "d", 1, 1 ) df #> # A tibble: 4 x 3 #> gene colB colC #> <chr> <dbl> <dbl> #> 1 a 1 2 #> 2 b 2 3 #> 3 c 3 4 #> 4 d 1 1 What I want to do is to filter every columns after gene column for values greater or equal 2 (>=2). Resulting in this: gene, colB, colC a NA 2 b 2 3 c 3 4 How can I achieve that? The number of columns after genes actually is more than just 2. 回答1: One

How to filter rows for every column independently using dplyr

阅读更多关于 How to filter rows for every column independently using dplyr

How to subset data by filtering and grouping efficiently in R

阅读更多关于 How to subset data by filtering and grouping efficiently in R

问题 I'm working on a project and am looking for some help to make my code run more efficiently. I've searched for similar problems but can't seem to find anything quite as granular as this one. The solution I've come up with is extremely clunky. I'm confident that there must be a more efficient way to do this with a package like dplyr , data.tables , etc. Problem: I have 3 columns of data, 'ids' , 'x.group' , and 'times' . I need to extract the first 3 unique 'ids' that appear in each 'times'

How to combine two data frames using dplyr or other packages?

阅读更多关于 How to combine two data frames using dplyr or other packages?

问题 I have two data frames: df1 = data.frame(index=c(0,3,4),n1=c(1,2,3)) df1 # index n1 # 1 0 1 # 2 3 2 # 3 4 3 df2 = data.frame(index=c(1,2,3),n2=c(4,5,6)) df2 # index n2 # 1 1 4 # 2 2 5 # 3 3 6 I want to join these to: index n 1 0 1 2 1 4 3 2 5 4 3 8 (index 3 in two df, so add 2 and 6 in each df) 5 4 3 6 5 0 (index 5 not exists in either df, so set 0) 7 6 0 (index 6 not exists in either df, so set 0) The given data frames are just part of large dataset. Can I do it using dplyr or other packages