subset

subset data frame in R using loop

不打扰是莪最后的温柔 提交于 2019-11-30 10:09:35
I have a data frame that looks like this: ---------- index ID date Amount 2 1001 2010-06-08 0 21 1001 2010-10-08 10 6 1002 2010-08-16 30 5 1002 2010-11-25 20 9 1003 2010-01-01 0 8 1003 2011-03-06 10 12 1004 2012-03-12 10 11 1004 2012-06-21 10 15 1005 2010-01-01 30 13 1005 2010-04-06 20 I want to subset this data so that i have new data frames, one for each ID like this index ID date Amount 2 1001 2010-06-08 0 21 1001 2010-10-08 10 and 6 1002 2010-08-16 30 5 1002 2010-11-25 20 and so on. I dont need to save the new data frames, but use it to perform some basic calculations. Also i want to do

getting a sample of a data.frame in R

别等时光非礼了梦想. 提交于 2019-11-30 09:32:43
问题 I have the following data frame in R: id<-c(1,2,3,4,10,2,4,5,6,8,2,1,5,7,7) date<-c(19970807,19970902,19971010,19970715,19991212,19961212,19980909,19990910,19980707,19991111,19970203,19990302,19970605,19990808,19990706) spent<-c(1997,19,199,134,654,37,876,890,873,234,643,567,23,25,576) df<-data.frame(id,date,spent) I need to take a random sample of 3 customers (based on id) in a way that all observations of the customers be extracted. 回答1: You want to use %in% and unique df[df$id %in% sample

subset data frame based on percentage

空扰寡人 提交于 2019-11-30 07:27:40
i have a data frame that contains a data like this : V1 V2 V3 1 2 0.34 1 3 0.31 1 4 0.12 1 5 0.12 the data frame is bigger but that's an example. i want to take a subset of this data frame that has the lowest 20% of V3. how this can be done ? thanks for help The subset() function is handy because (among other benefits) it allows you to avoid having to repeatedly mention the name of the data-frame: subset(dataFrame, V3 <= quantile(V3, 0.2)) Jubbles ss <- subset(dataFrame, subset=(dataFrame$V3 <= quantile(dataFrame$V3, 0.20))) 来源: https://stackoverflow.com/questions/6253837/subset-data-frame

Choose variables based on name (simple regular expression)

守給你的承諾、 提交于 2019-11-30 07:19:33
I would like to incorporate variable names that imply what I should do with them. I imagine a dataframe "survey". library(Rlab) # Needed for rbern() function. survey <- data.frame(cbind( id = seq(1:10), likert_this = sample(seq(1:7),10, replace=T), likert_that = sample(seq(1:7), 10, replace=T), dim_bern_varx = rbern(10, 0.6), disc_1 = sample(letters[1:5],10,replace=T))) Now I would like to do certain things with all variables that contain likert , other things with variables that contain bern etc. How can this be done in R? Shane You can use grep() with colnames() : survey[,grep("bern",

Read FASTA into a dataframe and extract subsequences of FASTA file

社会主义新天地 提交于 2019-11-30 06:56:47
问题 I have a small fasta file of DNA sequences which looks like this: >NM_000016 700 200 234 ACATATTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC >NM_000775 700 124 236 CTAACCTCTCCCAGTGTGGAACCTCTATCTCATGAGAAAGCTGGGATGAG >NM_003820 700 111 222 ATTTCCTCCTGCTGCCCGGGAGGTAACACCCTGGACCCCTGGAGTCTGCA Questions: 1) How can I read this fasta file into R as a dataframe where each row is a sequence record, the 1st column is the refseqID and the 2nd column is the sequence. 2) How to extract subsequence at (start,

How to remove groups of observation with dplyr::filter()

别说谁变了你拦得住时间么 提交于 2019-11-30 06:47:39
For the following data ds <- read.table(header = TRUE, text =" id year attend 1 2007 1 1 2008 1 1 2009 1 1 2010 1 1 2011 1 8 2007 3 8 2008 NA 8 2009 3 8 2010 NA 8 2011 3 9 2007 2 9 2008 3 9 2009 3 9 2010 5 9 2011 5 10 2007 4 10 2008 4 10 2009 2 10 2010 NA 10 2011 NA ") ds<- ds %>% dplyr::mutate(time=year-2000) print(ds) How would I write a dplyr::filter() command to keep only the ids that don't have a single NA? So only subjects with ids 1 and 9 should stay after the filter. Robert Krzyzanowski Use filter in conjunction with base::ave ds %>% dplyr::filter(ave(!is.na(attend), id, FUN = all)) To

Tables whose sole purpose is specify a subset of another table

℡╲_俬逩灬. 提交于 2019-11-30 06:06:37
问题 The database I'm designing has an employees table; there can be multiple types of employees, one of which are medical employees. The database needs to also describe a many-to-many relation between medical employees and what competences they have. Is it okay to create a table medical_employees with only an id column, whose only purpose is to specify which employees are medics? The id column has a foreign key constraint that references the employees table. The code below should make my question

How do I select rows by two criteria in data.table in R

大城市里の小女人 提交于 2019-11-30 04:33:51
Let's say I have a data.table and I want to select all the rows where the variable x has a value of b. That is easy library(data.table) DT <- data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9) setkey(DT,x) # set a 1-column key DT["b"] By the way, it appears that one has to set a key, if the key is not set to x then this does not work. By the way what would happen if I set two columns as keys? Anyway, moving along, lets say that I want to select all the rows where the variable x was a or b DT["b"|"a"] does not work But the following works DT[x=="a"|x=="b"] But that uses vector scanning

Subset a dataframe by multiple factor levels [duplicate]

送分小仙女□ 提交于 2019-11-30 03:46:52
This question already has an answer here: Select rows from a data frame based on values in a vector 3 answers How can I avoid using a loop to subset a dataframe based on multiple factor levels? In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected". Working example: #sample data Code<-c("A","B","C","D","C","D","A","A") Value<-c(1, 2, 3, 4, 1, 2, 3, 4) data<-data.frame(cbind(Code, Value)) selected<-c("A","B") #want rows that contain A and B #Begin subsetting

R finding rows of a data frame where certain columns match those of another [duplicate]

我是研究僧i 提交于 2019-11-30 02:24:17
This question already has an answer here: Extract data from one data frame to another data frame with different row length 4 answers I have an R question that I'm even sure how to word in one sentence, and couldn't find an answer for this yet. I have two data frames that I would like to 'intersect' and find all rows where column values match in two columns. I've tried connecting two intersect() and which() statements with &&, but neither has given me what I want yet. Here's what I mean. Let's say I have two data frames: > testData Email Manual Campaign Bounced Opened Clicked ClickThru