subset | 易学教程

Subsetting R with dynamic variables [duplicate]

阅读更多关于 Subsetting R with dynamic variables [duplicate]

问题 This question already has answers here : Brackets make a vector different. How exactly is vector expression evaluated? (3 answers) Closed 3 years ago . I have the below example code. I have a dataframe ts which has 16 rows. when I subset with actual numbers it works fine but when I subset with calculated numbers why is my code behaving weirdly ? Can anyone please explain me what's wrong in this? Case1: > a [1] 12 > c [1] 16 > ts$trend[13:16] [1] 21.36926 21.48654 21.60383 21.72111 > ts$trend

Subset columns of one data frame according to another data frame's rows

阅读更多关于 Subset columns of one data frame according to another data frame's rows

问题 I would like to subset some of its columns according to another data frame's rows. So the two data frames are as shown below: df1 <- structure(list(ID = structure(c(3L, 1L, 2L, 5L, 4L), .Label = c("cg08", "cg09", "cg29", "cg36", "cg65"), class = "factor"), chr = c(16L, 3L, 3L, 1L, 8L), gene = c(534L, 376L, 171L, 911L, 422L), GS12 = c(0.15, 0.87, 0.6, 0.1, 0.72), GS32 = c(0.44, 0.93, 0.92, 0.07, 0.91), GS56 = c(0.46, 0.92, 0.62, 0.06, 0.87), GS87 = c(0.79, 0.93, 0.86, 0.08, 0.88)), .Names = c(

Using lapply to subset rows from data frames — incorrect number of dimensions error

阅读更多关于 Using lapply to subset rows from data frames — incorrect number of dimensions error

问题 I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below: trim <- function(i) { (i <- i[33:152,]) } lapply(scenbase, trim) Error in i[33:152, ] : incorrect

R: How to efficiently find out whether data.frame A is contained in data.frame B?

阅读更多关于 R: How to efficiently find out whether data.frame A is contained in data.frame B?

问题 In order to find out whether data frame df.a is a subset of data frame df.b I did the following: df.a <- data.frame( x=1:5, y=6:10 ) df.b <- data.frame( x=1:7, y=6:12 ) inds.x <- as.integer( lapply( df.a$x, function(x) which(df.b$x == x) )) inds.y <- as.integer( lapply( df.a$y, function(y) which(df.b$y == y) )) identical( inds.x, inds.y ) The last line gave TRUE , hence df.a is contained in df.b . Now I wonder whether there is a more elegant - and possibly more efficient - way to answer this

Subset pandas data frame with datetime columns

阅读更多关于 Subset pandas data frame with datetime columns

问题 Following up this question where a pandas data frame is subset by one string variable and one datetime variable using idx.min , how could we subset by two date time variables? For the example data frame below, how would we subset rows from class == C , with the minimum base_date and the maximum date_2 date? [answer would be row 3]: print(example) slot_id class day base_date date_2 0 1 A Monday 2019-01-21 2019-01-24 1 2 B Tuesday 2019-01-22 2019-01-23 2 3 C Wednesday 2019-01-22 2019-01-24 3 4

Subset based on first three numbers

阅读更多关于 Subset based on first three numbers

问题 I have a very large data set of variables and I need to subset based on the first three numbers of the zip code. I'm not sure how to do this and would appreciate any help you can provide. How would I subset this example dput to remove all those zip codes that start with 721. Note that I can't simple do a greater than (>) since there are zip codes large than 721 Thanks! dput : data <- structure(list(state = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

R - Speeding up approximate date match. idata.frame?

阅读更多关于 R - Speeding up approximate date match. idata.frame?

问题 I am struggling to efficiently perform a "close" date match between two data frames. This question explores a solution using idata.frame from the plyr package, but I would be very happy with other suggested solutions as well. Here is a very simplistic version of the two data frames: sampleticker<-data.frame(cbind(ticker=c("A","A","AA","AA"), date=c("2005-1-25","2005-03-30","2005-02-15","2005-04-21"))) sampleticker$date<-as.Date(sampleticker$date,format="%Y-%m-%d") samplereport<-data.frame

R data subset restructuring [closed]

阅读更多关于 R data subset restructuring [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . I am fairly new to R/Rstudio and I am still learning how to do certain operations. I have the following data set. For columns I have Operating Region, type of element(CA,OBU), sub-element and Net Revenue. Currently the data is quite big(50 000 rows) and I want to get a summary of

How to automate subsetting multiple files using r

阅读更多关于 How to automate subsetting multiple files using r

问题 Hi guys am new to R and I am comfortable with creating subsets if i handle one file at a time .... But I am having trouble automating that to multiple files...So in my case,I want to automate the process of subsetting multiple csv files which are present in multiple subfolders of a given folder ...I want to create multiple subset files which include say the the 100 rows of each file and write them into new files and the name of the subsetted files should be same as that of the file from which

Automatically subset data frame by factor

阅读更多关于 Automatically subset data frame by factor

问题 Looking for help writing a function to automatically subset data frames based on the value of a column? For example, df$x contains values a, b, c, d I want to make separate data frames named a, b, c, d that contain all values x == 'a', or x == 'b', etc. I know several methods to do this manually but am hoping for guidance on how to automate this? Thank you! 回答1: maybe not the best way to do it, but will get the job done. vars_df = unique(df$x) for (i in 1:length(vars_df)) { assign(paste0(vars