subset | 易学教程

How to subset data with advance string matching

阅读更多关于 How to subset data with advance string matching

I have the following data frame from which I would like to extract rows based on matching strings. > GEMA_EO5 gene_symbol fold_EO p_value RefSeq_ID BH_p_value KNG1 3.433049 8.56e-28 NM_000893,NM_001102416 1.234245e-24 REXO4 3.245317 1.78e-27 NM_020385 2.281367e-24 VPS29 3.827665 2.22e-25 NM_057180,NM_016226 2.560770e-22 CYP51A1 3.363149 5.95e-25 NM_000786,NM_001146152 6.239386e-22 TNPO2 4.707600 1.60e-23 NM_001136195,NM_001136196,NM_013433 1.538000e-20 NSDHL 2.703922 6.74e-23 NM_001129765,NM_015922 5.980454e-20 DPYSL2 5.097382 1.29e-22 NM_001386 1.062868e-19 So I would like to extract e.g. two

Bulk update in subset obtained from dataframe filtering [duplicate]

阅读更多关于 Bulk update in subset obtained from dataframe filtering [duplicate]

问题 This question already has answers here : Updating a subset of a dataframe (2 answers) Closed 8 months ago . I have a dataframe, which I filter based on 2 condition as follow: subset(sales_data, month == 'Jan' & dept_name == 'Production')` I want to bulk update the value of a particular column(let's say status ) of above subset Something like subset(sales_data, month == 'Jan' & dept_name == 'Production')["status"] <- "Good result"` I am not sure, how I can do this. 回答1: You could do sales_data

Efficient method to subset drop rows with NA values in R

阅读更多关于 Efficient method to subset drop rows with NA values in R

Background Before running a stepwise model selection, I need to remove missing values for any of my model terms. With quite a few terms in my model, there are therefore quite a few vectors that I need to look in for NA values (and drop any rows that have NA values in any of those vectors). However, there are also vectors that contain NA values that I do not want to use as terms / criteria for dropping rows. Question How do I drop rows from a dataframe which contain NA values for any of a list of vectors? I'm currently using the clunky method of a long series of !is.na's > my.df[!is.na(my.df

how do I grep in R?

阅读更多关于 how do I grep in R?

I would like to choose rows based on the subsets of their names, for example If I have the following data: data <- structure(c(91, 92, 108, 104, 87, 91, 91, 97, 81, 98), .Names = c("fee-", "fi", "fo-", "fum-", "foo-", "foo1234-", "123foo-", "fum-", "fum-", "fum-")) how do I select the rows matching 'foo'? using grep() doesn't work: grep('foo', data) returns: integer(0) what am I doing wrong? or, is there a better way? Thanks! You need to grep the names property of data, not the values property. For your example, use > grep("foo",names(data)) [1] 5 6 7 > data[grep("foo",names(data))] foo-

R subset a data frame with multiple keys [closed]

阅读更多关于 R subset a data frame with multiple keys [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 6 years ago . I have the following data frame id val a 1 a 2 a 3 b 4 b 5 c 6 I would like to find a subset of this data frame using a subset of the id's. I know I can do the following if the subset criteria is just 1 value for

Logical condition while subsetting not giving correct values

阅读更多关于 Logical condition while subsetting not giving correct values

问题 I wanted to subset data frame project I was working with, using a logical. I am getting a paradoxical result. The part of the logical preceding the ROLL.NO. argument is irrelevant to the question. Sorry, I could not give a reproducible example. Do let me know how can I make this question reproducible without having to show the entire 393 entries of the relevant columns in my data frame. D14 and DC31 are simple integer values, with some values being NA . culprits<-project$ROLL.NO.[(project

R Not in subset [duplicate]

阅读更多关于 R Not in subset [duplicate]

Possible Duplicate: Standard way to remove multiple elements from a dataframe I know in R that if you are searching for a subset of another group or matching based on id you'd use something like subset(df1, df1$id %in% idNums1) My question is how to do the opposite or choose items NOT matching a vector of ids. I tried using ! but get the error message subset(df1, df1$id !%in% idNums1) I think my backup is to do sometime like this: matches <- subset(df1, df1$id %in% idNums1) nonMatches <- df1[(-matches[,1]),] but I'm hoping there's something a bit more efficient. The expression df1$id %in%

Selecting Specific Dates in R

阅读更多关于 Selecting Specific Dates in R

问题 I am wondering how to create a subset of data in R based on a list of dates, rather than by a date range. For example, I have the following data set data which contains 3 years of 6-minute data. date zone month day year hour minute temp speed gust dir 1 09/06/2009 00:00 PDT 9 6 2009 0 0 62 2 15 156 2 09/06/2009 00:06 PDT 9 6 2009 0 6 62 13 16 157 I have used breeze<-subset(data, ws>=15 & wd>=247.5 & wd<=315, select=date:dir) to select the rows which meet my criteria for a sea breeze, which is

Transpose long to wide in SAS

阅读更多关于 Transpose long to wide in SAS

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 3 years ago . I have a very large data set (18 million observations) that I would like to transpose by subsetting based on one variable and creating 900 new variables out of those sub/ets. Example code and desired output format below: Example data: data long1 ; input famid year faminc ; cards ; var1 96 40000 var1 97 40500 var1 98 41000 var2 96 45000 var2 97 45400 var2 98 45800 var3 96 75000

R: subsetting data frame by both certain column names (as a variable) and field values

阅读更多关于 R: subsetting data frame by both certain column names (as a variable) and field values

问题 I have list of names and I have a data frame with colnames that match sometimes the names in the list. Now I want to subset the data frame based on two criteria: the colnames (as a variable) in the list and the values of the fields in those columns. I tried it this way: names.list <- c("name1", "name2" , "name5") names <- as.data.frame(names.list) df <- *dataframe with colnames "name1", "name2", "name3", "name4", etc.* for (i in 1:nrow(names)){ name <- names[i,1] df <- subset(df, name > 1.5)