subset | 易学教程

Get a subset not containing a given value of the column

阅读更多关于 Get a subset not containing a given value of the column

问题 I have a table called data : A 22 B 333 C Not Av. D Not Av. How can I get a subset, from which all rows containing "Not Av." are excluded? It is important to mention that I have the index of a column to be checked (in this case colnum = 2), but I don't have its name. I tried this, but it does not work: data<-subset(data,colnum!="Not Available") 回答1: df <- read.csv(text="A,22 B,333 C,Not Av. D,Not Av.", header=F) df[df[,2] != "Not Av.",] 回答2: You don't really need the subset function. Just use

Get a subset not containing a given value of the column

阅读更多关于 Get a subset not containing a given value of the column

R - subset column based on condition on duplicate rows

阅读更多关于 R - subset column based on condition on duplicate rows

问题 I have a dataframe with an id column that is repeated, with site counts. I want to know how I can remove the duplicates ID records only when Site_Count record is more than 0. Generate DF: DF <- data.frame( 'ID' = sample(100:300, 100, replace=T), 'Site_count' = sample(0:1, 100, replace=T) ) My attempt at the subset: subset(DF[!duplicated(DF$ID),], site_count > 0) But in this case it will remove all 0 site counts - I want to subset to only remove the record when there is a duplicate record with

Subset by multiple ranges [duplicate]

阅读更多关于 Subset by multiple ranges [duplicate]

问题 This question already has answers here : Efficient way to filter one data frame by ranges in another (3 answers) Closed 2 years ago . I want to get a list of values that fall in between multiple ranges. library(data.table) values <- data.table(value = c(1:100)) range <- data.table(start = c(6, 29, 87), end = c(10, 35, 92)) I need the results to include only the values that fall in between those ranges: results <- c(6, 7, 8, 9, 10, 29, 30, 31, 32, 33, 34, 35, 87, 88, 89, 90, 91, 92) I am

Subset by multiple ranges [duplicate]

阅读更多关于 Subset by multiple ranges [duplicate]

Subsetting data.table set by date range in R

阅读更多关于 Subsetting data.table set by date range in R

问题 I have a large dataset in data.table that I'd like to subset by a date range. My data set looks like this: testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04", "2013-10-05","2013-11-06")), yr = c(2013,2013,2013,2013,2013), mo = c(07,08,09,10,11), da = c(02,03,04,05,06), plant = LETTERS[1:5], product = as.factor(letters[26:22]), rating = runif(25)) I'd like to be able to choose a date range directly from the as.Date column without using the yr , mo , or da columns.

How to subset data in R without losing NA rows?

阅读更多关于 How to subset data in R without losing NA rows?

问题 I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA. I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis. df2 <- subset ( df1 , Height < 40 ) However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm f1 <- function ( x , na.rm = FALSE ) { df2 <- subset ( x , Height < 40 ) } f1 (

How to subset data in R without losing NA rows?

阅读更多关于 How to subset data in R without losing NA rows?

best way to pick a random subset from a collection?

阅读更多关于 best way to pick a random subset from a collection?

问题 I have a set of objects in a Vector from which I'd like to select a random subset (e.g. 100 items coming back; pick 5 randomly). In my first (very hasty) pass I did an extremely simple and perhaps overly clever solution: Vector itemsVector = getItems(); Collections.shuffle(itemsVector); itemsVector.setSize(5); While this has the advantage of being nice and simple, I suspect it's not going to scale very well, i.e. Collections.shuffle() must be O(n) at least. My less clever alternative is

Add data to data.frame with 0 rows

阅读更多关于 Add data to data.frame with 0 rows

问题 Consider this: df <- data.frame(a=1:2, b=3:4) I can add a new column and assign values to it like this: df$c <- 5 But if I subset this, so its an empty data.frame and try to assign anything to it, it will return an error: df2 <- subset(df, a==3) df2$d <- 10 Error in $<-.data.frame( tmp , "d", value = 10) : replacement has 1 row, data has 0 This will stop loops, so my question is if there are other ways to assign values to a column in a dataframe that does not return errors when the dataframe