subset | 易学教程

Subset with unique cases, based on multiple columns

阅读更多关于 Subset with unique cases, based on multiple columns

I'd like to subset a dataframe to include only rows that have unique combinations of three columns. My situation is similar to the one presented in this question, but I'd like to preserve the other columns in my data as well. Here's my example: > df v1 v2 v3 v4 v5 1 7 1 A 100 98 2 7 2 A 98 97 3 8 1 C NA 80 4 8 1 C 78 75 5 8 1 C 50 62 6 9 3 C 75 75 The requested output would be something like this, where I'm looking for unique cases based on v1, v2, and v3 only: > df.new v1 v2 v3 v4 v5 1 7 1 A 100 98 2 7 2 A 98 97 3 8 1 C NA 80 6 9 3 C 75 75 If I could recover the non-unique rows that would be

How to check whether the elements of an ArrayList are all contained in another ArrayList

阅读更多关于 How to check whether the elements of an ArrayList are all contained in another ArrayList

问题 How can I easily check to see whether all the elements in one ArrayList are all elements of another ArrayList? 回答1: Use Collection.containsAll(): boolean isSubset = listA.containsAll(listB); 回答2: There is a containsAll method in all collections. 来源： https://stackoverflow.com/questions/808394/how-to-check-whether-the-elements-of-an-arraylist-are-all-contained-in-another-a

R applying a function to a subset of a data frame [duplicate]

阅读更多关于 R applying a function to a subset of a data frame [duplicate]

问题 This question already has an answer here : Apply function conditionally (1 answer) Closed 6 years ago . I looked online extensively and did not see an answer to this particular question (I think). The best way for me to explain myself will be with some code that replicates my problem. I made some temp data: x <- runif(100,1,2) y <- runif(100,2,3) z <- c(rep(1,100)) temp <- cbind(x,y,z) temp[1:25,3] = temp[1:25,3] +2 temp <- as.data.frame(temp) And this is what temp looks like x y z 1 1.512620

subsetting in xts using a parameter holding dates

阅读更多关于 subsetting in xts using a parameter holding dates

问题 I am familiar with the xts subsetting abilities. However, I can't find an elegant way to subset a parameterized range of dates. something like this: times = c(as.POSIXct("2012-11-03 09:45:00 IST"), as.POSIXct("2012-11-05 09:45:00 IST")) #create an xts object: xts.obj = xts(c(1,2),order.by = times) #filter with these dates: start.date = as.POSIXct("2012-11-03") end.date = as.POSIXct("2012-11-04") #instead of xts["2012-11-03"/"2012-11-04"], do something like this: xts[start.date:end.date] Does

R: How to filter/subset a sequence of dates

阅读更多关于 R: How to filter/subset a sequence of dates

i've this data: (complete for Dicember) date sessions 1 2014-12-01 1932 2 2014-12-02 1828 3 2014-12-03 2349 4 2014-12-04 8192 5 2014-12-05 3188 6 2014-12-06 3277 And a need to subet/filter this, for example from "2014-12-05" to "2014-12-25" I now that you can create a sequence with the operator ":". Example: b <- c(1:5) But How to filter a sequence? I tried this NewDate <- filter(Dates, date("2014-12-05":"2014-12-12")) But says: Error: unexpected symbol in: "NewDate <- filter(Dates, date("2014-12-05":"2014-12-12") NewDate" you could use subset Generating your sample data: temp<- read.table

Why does dplyr's filter drop NA values from a factor variable?

阅读更多关于 Why does dplyr's filter drop NA values from a factor variable?

When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example: library(dplyr) set.seed(919) (dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T)))) # var1 # 1 <NA> # 2 3 # 3 3 # 4 1 # 5 1 # 6 <NA> # 7 2 # 8 2 # 9 <NA> # 10 1 filter(dat, var1 != 1) # var1 # 1 3 # 2 3 # 3 2 # 4 2 This does not seem ideal -- I only wanted to drop rows where var1 == 1 . It looks like this is occurring because any comparison with NA returns NA , which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces

Faster way to subset on rows of a data frame in R?

阅读更多关于 Faster way to subset on rows of a data frame in R?

I have been using these 2 methods interchangeably to subset data from a data frame in R. Method 1 subset_df <- df[which(df$age>5) , ] Method 2 subset_df <- subset(df, age>5) I had 2 questions belonging to these. 1. Which one is faster considering I have very large data? 2. This post here Subsetting data frames in R suggests that there is in fact difference between above 2 methods. One of them handles NA accurately. Which one is safe to use then? The question asks for a faster way to subset rows of a data frame. The fastest way is with data.table. set.seed(1) # for reproducible example # 1

In R, how do I subset a data.frame by values from another data.frame?

阅读更多关于 In R, how do I subset a data.frame by values from another data.frame?

I have two data frames. The first, df.1 , contains two columns of paired numerical identifiers, where each column includes ~100,000 rows. The second data frame, df.2 , includes one column ( df.2$C ) of numerical identifiers. This data frame has around 200 rows. How can I find the paired subset of data of df.1 that includes only the rows with values of the identifiers found in df.2$C ? The final subset would include the paired data of df.1 which corresponds to identifiers found in df.2$C that match the identifiers found in df.1$A , df.1$B or both. You could use ?"%in%" (similar to ?match ): df1

Subsetting data.table set by date range in R

阅读更多关于 Subsetting data.table set by date range in R

I have a large dataset in data.table that I'd like to subset by a date range. My data set looks like this: testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04", "2013-10-05","2013-11-06")), yr = c(2013,2013,2013,2013,2013), mo = c(07,08,09,10,11), da = c(02,03,04,05,06), plant = LETTERS[1:5], product = as.factor(letters[26:22]), rating = runif(25)) I'd like to be able to choose a date range directly from the as.Date column without using the yr , mo , or da columns. Currently, I'm subsetting by mo and it's extremely clunky at times, especially when years switch over. A

Undefined columns selected when subsetting data frame

阅读更多关于 Undefined columns selected when subsetting data frame

I have a data frame, str(data) to show more about my data frame the result is the following: > str(data) 'data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day : int 1 2 3 4 5 6 7 8 9 10 ... However, for example, when I want to subset the amounts of Ozone above 14 I use the following code which gives me an error: > data[data$Ozone > 14 ] Error in [.data.frame