subset

Get a subset not containing a given value of the column

99封情书 提交于 2020-01-11 14:23:08
问题 I have a table called data : A 22 B 333 C Not Av. D Not Av. How can I get a subset, from which all rows containing "Not Av." are excluded? It is important to mention that I have the index of a column to be checked (in this case colnum = 2), but I don't have its name. I tried this, but it does not work: data<-subset(data,colnum!="Not Available") 回答1: df <- read.csv(text="A,22 B,333 C,Not Av. D,Not Av.", header=F) df[df[,2] != "Not Av.",] 回答2: You don't really need the subset function. Just use

Get a subset not containing a given value of the column

孤街醉人 提交于 2020-01-11 14:23:08
问题 I have a table called data : A 22 B 333 C Not Av. D Not Av. How can I get a subset, from which all rows containing "Not Av." are excluded? It is important to mention that I have the index of a column to be checked (in this case colnum = 2), but I don't have its name. I tried this, but it does not work: data<-subset(data,colnum!="Not Available") 回答1: df <- read.csv(text="A,22 B,333 C,Not Av. D,Not Av.", header=F) df[df[,2] != "Not Av.",] 回答2: You don't really need the subset function. Just use

R - subset column based on condition on duplicate rows

橙三吉。 提交于 2020-01-11 12:13:11
问题 I have a dataframe with an id column that is repeated, with site counts. I want to know how I can remove the duplicates ID records only when Site_Count record is more than 0. Generate DF: DF <- data.frame( 'ID' = sample(100:300, 100, replace=T), 'Site_count' = sample(0:1, 100, replace=T) ) My attempt at the subset: subset(DF[!duplicated(DF$ID),], site_count > 0) But in this case it will remove all 0 site counts - I want to subset to only remove the record when there is a duplicate record with

Subset by multiple ranges [duplicate]

冷暖自知 提交于 2020-01-10 04:25:45
问题 This question already has answers here : Efficient way to filter one data frame by ranges in another (3 answers) Closed 2 years ago . I want to get a list of values that fall in between multiple ranges. library(data.table) values <- data.table(value = c(1:100)) range <- data.table(start = c(6, 29, 87), end = c(10, 35, 92)) I need the results to include only the values that fall in between those ranges: results <- c(6, 7, 8, 9, 10, 29, 30, 31, 32, 33, 34, 35, 87, 88, 89, 90, 91, 92) I am

Subset by multiple ranges [duplicate]

ε祈祈猫儿з 提交于 2020-01-10 04:25:10
问题 This question already has answers here : Efficient way to filter one data frame by ranges in another (3 answers) Closed 2 years ago . I want to get a list of values that fall in between multiple ranges. library(data.table) values <- data.table(value = c(1:100)) range <- data.table(start = c(6, 29, 87), end = c(10, 35, 92)) I need the results to include only the values that fall in between those ranges: results <- c(6, 7, 8, 9, 10, 29, 30, 31, 32, 33, 34, 35, 87, 88, 89, 90, 91, 92) I am

Subsetting data.table set by date range in R

谁都会走 提交于 2020-01-09 06:25:34
问题 I have a large dataset in data.table that I'd like to subset by a date range. My data set looks like this: testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04", "2013-10-05","2013-11-06")), yr = c(2013,2013,2013,2013,2013), mo = c(07,08,09,10,11), da = c(02,03,04,05,06), plant = LETTERS[1:5], product = as.factor(letters[26:22]), rating = runif(25)) I'd like to be able to choose a date range directly from the as.Date column without using the yr , mo , or da columns.

How to subset data in R without losing NA rows?

随声附和 提交于 2020-01-08 21:56:01
问题 I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA. I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis. df2 <- subset ( df1 , Height < 40 ) However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm f1 <- function ( x , na.rm = FALSE ) { df2 <- subset ( x , Height < 40 ) } f1 (

How to subset data in R without losing NA rows?

一个人想着一个人 提交于 2020-01-08 21:55:51
问题 I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA. I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis. df2 <- subset ( df1 , Height < 40 ) However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm f1 <- function ( x , na.rm = FALSE ) { df2 <- subset ( x , Height < 40 ) } f1 (

best way to pick a random subset from a collection?

女生的网名这么多〃 提交于 2020-01-08 16:05:29
问题 I have a set of objects in a Vector from which I'd like to select a random subset (e.g. 100 items coming back; pick 5 randomly). In my first (very hasty) pass I did an extremely simple and perhaps overly clever solution: Vector itemsVector = getItems(); Collections.shuffle(itemsVector); itemsVector.setSize(5); While this has the advantage of being nice and simple, I suspect it's not going to scale very well, i.e. Collections.shuffle() must be O(n) at least. My less clever alternative is

Add data to data.frame with 0 rows

。_饼干妹妹 提交于 2020-01-07 05:16:23
问题 Consider this: df <- data.frame(a=1:2, b=3:4) I can add a new column and assign values to it like this: df$c <- 5 But if I subset this, so its an empty data.frame and try to assign anything to it, it will return an error: df2 <- subset(df, a==3) df2$d <- 10 Error in $<-.data.frame( tmp , "d", value = 10) : replacement has 1 row, data has 0 This will stop loops, so my question is if there are other ways to assign values to a column in a dataframe that does not return errors when the dataframe