Subset multiple rows with condition

∥☆過路亽.° 提交于 2019-12-20 06:44:21

问题


I have a .txt file read into a table called power with over 2 million observations of 9 variables. I am trying to subset power by two rows containing either "01/02/2007" or "02/02/2007". After creating the subset, the RStudio environment said I ended up with zero observations, but the same variables.

How can I get a subset of the data with only rows containing "01/02/2007" and "02/02/2007"?

I saw a similar post, but still got an error on my dataset. See link: Select multiple rows conditioning on ID in R

My data:

#load data
> power <- read.table("textfile.txt", stringsAsFactors = FALSE, head = TRUE)
#subsetted first column called Date
> head(power$Date)
#[1] 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006

> str(power$Date)
 chr [1:2075259] "16/12/2006" "16/12/2006" "16/12/2006" "16/12/2006" ...

My code:

> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))

Subset data:

> str(powersub$Date)
 chr(0) 

回答1:


Try:

> subpower = power[power$Date %in% c("01/02/2007", "02/02/2007") ,]
> subpower
        Date Val
1 01/02/2007  14
8 02/02/2007  28

(Using power data from @akrun's answer)

Moreover, your own code will work if you use proper name of subset: "subpower" instead of "powersub"!

> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))
> subpower
        Date Val
1 01/02/2007  14
8 02/02/2007  28
>
> str(subpower)
'data.frame':   2 obs. of  2 variables:
 $ Date: chr  "01/02/2007" "02/02/2007"
 $ Val : int  14 28



回答2:


I am guessing that your dataset may have trailing/leading spaces for the column because

subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#       Date Val
#1 01/02/2007  14
#8 02/02/2007  28

If I change the rows to

power$Date[1] <- '01/02/2007 '
power$Date[8] <- ' 02/02/2007'

subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#[1] Date Val 
<0 rows> (or 0-length row.names)

You could use str_trim from stringr

library(stringr)
subset(power, str_trim(Date) %in% c('01/02/2007', '02/02/2007'))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

or use gsub

subset(power, gsub("^ +| +$", "", Date) %in% c('01/02/2007', '02/02/2007'))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

or another option without removing the spaces would be to use grep

subset(power, grepl('01/02/2007|02/02/2007', Date))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

data

power <- structure(list(Date = c("01/02/2007", "16/12/2006", "16/12/2006", 
"16/12/2006", "16/12/2006", "16/12/2006", "16/12/2006", "02/02/2007"
), Val = c(14L, 24L, 23L, 22L, 23L, 25L, 23L, 28L)), .Names = c("Date", 
"Val"), class = "data.frame", row.names = c(NA, -8L))



回答3:


Your approach is correct, try reading in the text file with

power <- read.table("textfile.txt", stringsAsFactors = FALSE)


来源:https://stackoverflow.com/questions/26825507/subset-multiple-rows-with-condition

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!