subset | 易学教程

How can I subset rows in a data frame in R based on a vector of values?

阅读更多关于 How can I subset rows in a data frame in R based on a vector of values?

I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a report. (Don't worry, this data isn't being permanently deleted!) I have read the following: Selecting columns in R data frame based on those *not* in a vector http://www.ats.ucla.edu/stat/r/faq/subset_R.htm How to combine multiple conditions to subset a data-frame using "OR"? But I'm still not able to get this to work right. Here's my code: bg2011missingFromBeg <- setdiff(x=eg2011$ID, y=bg2011$ID)

subset() a factor by its number of observation

阅读更多关于 subset() a factor by its number of observation

问题 I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation? NAME CLASS COLOR VALUE antonio B YELLOW 5 antonio B BLUE 8 antonio B BLUE 7 antonio B BLUE 12 luca C YELLOW 99 luca B YELLOW 87 luca B YELLOW 98 giovanni A BLUE 48 I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain: NAME CLASS COLOR VALUE antonio B BLUE mean because

Generate a powerset of a set without keeping a stack in Erlang or Ruby

阅读更多关于 Generate a powerset of a set without keeping a stack in Erlang or Ruby

问题 I would like to generate a powerset of a rather big set (about 30-50 elements) and I know that it takes 2^n to store the powerset. Is it possible to generate one subset at a time? I.e. generate a powerset of a set with iterations, saving each generated subset to disk/database, removing it from the stack/memory and only then continuing to generate other subsets? Unfortunately I have failed to modify Erlang and Ruby examples to my needs. 回答1: Edit: Added the enumerator (as @Jörg W Mittag) if no

Looping through t.tests for data frame subsets in r

阅读更多关于 Looping through t.tests for data frame subsets in r

I have a data frame 'math.numeric' with 32 variables. Each row represents a student and each variable is an attribute. The students have been put into 5 groups based on their final grade. The data looks as follows: head(math.numeric) school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason ... group 1 1 18 2 1 1 4 4 1 5 1 2 1 1 17 2 1 2 1 1 1 3 1 2 1 1 15 2 2 2 1 1 1 3 3 3 1 1 15 2 1 2 4 2 2 4 2 4 1 1 16 2 1 2 3 3 3 3 2 3 1 2 16 2 2 2 4 3 4 3 4 4 I am performing t-tests on each variable for group 1 vs. all the other groups to identify significantly different attributes with this group

Subsetting data.table using variables with same name as column

阅读更多关于 Subsetting data.table using variables with same name as column

I want to subset a data.table using a variable which has the same name as the column which leeds to some problems: dt <- data.table(a=sample(c('a', 'b', 'c'), 20, replace=TRUE), b=sample(c('a', 'b', 'c'), 20, replace=TRUE), c=sample(20), key=c('a', 'b')) evn <- environment() a <- 'b' dt[a == a] #Expected Result dt[a == 'b'] I came across this possible solution : env <- environment() dt[a == get('a',env)] But it is as unhandy as: this.a = a dt[a == this.a] So is there another elegant solution? For now, a temporary solution could be, `..` <- function (..., .env = globalenv()) { get(deparse

Split/subset a data frame by factors in one column [duplicate]

阅读更多关于 Split/subset a data frame by factors in one column [duplicate]

This question already has an answer here: Split data.frame based on levels of a factor into new data.frames 1 answer My data is like this (for example): ID Rate State 1 24 AL 2 35 MN 3 46 FL 4 34 AL 5 78 MN 6 99 FL Data: structure(list(ID = 1:6, Rate = c(24L, 35L, 46L, 34L, 78L, 99L), State = structure(c(1L, 3L, 2L, 1L, 3L, 2L), .Label = c("AL","FL", "MN"), class = "factor")), .Names = c("ID", "Rate", "State"), class = "data.frame", row.names = c(NA, -6L)) I want to split the data by state and I want to get 3 data sets like below: data set 1 ID Rate State 1 24 AL 4 34 AL data set 2 ID Rate

Pull nth Day of Month in XTS in R

阅读更多关于 Pull nth Day of Month in XTS in R

问题 My questions is closely related to the one asked here: Pull Return from first business day of the month from XTS object using R. Instead of extracting the first day of each month, I want to extract, say the 10th data point of each month. How can I do this? 回答1: Using the same example data from the question you've linked to, you can do some basic subsetting. Here's the sample data: library(xts) data(sample_matrix) x <- as.xts(sample_matrix) Here's the subsetting: x[format(index(x), "%d") ==

Pass subset argument through a function to subset

阅读更多关于 Pass subset argument through a function to subset

问题 I would like to have a function which calls subset , and passes on a subset argument: df <- data.frame(abc=c("A","A","B","B"),value=1:4) subset(df,abc=="A") ## works of course: # abc value #1 A 1 #2 A 2 mysubset <- function(df,ssubset) subset(df,ssubset) mysubset(df,abc=="A") ## Throws an error # Error in eval(expr, envir, enclos) : object 'abc' not found mysubset2 <- function(df,ssubset) subset(df,eval(ssubset)) mysubset2(df,expression(abc=="A")) ## Works, but needs expression I tried with

Return data subset time frames within another timeframes?

阅读更多关于 Return data subset time frames within another timeframes?

There are very nifty ways of subsetting xts objects. For example, one can get all the data for all years, months, days but being strictly between 9:30 AM and 4 PM by doing: my_xts["T09:30/T16:00"] Or you can get all the observations between two dates by doing: my_xts["2012-01-01/2012-03-31"] Or all the dates before/after a certain date by doing: my_xts["/2011"] # from start of data until end of 2011 my_xts["2011/"] # from 2011 until the end of the data How can I get all the data for only certain months for all years or only certain days for all months and years? Do any other subsetting tricks

How to subset data in R without losing NA rows?

阅读更多关于 How to subset data in R without losing NA rows?

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA. I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis. df2 <- subset ( df1 , Height < 40 ) However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm f1 <- function ( x , na.rm = FALSE ) { df2 <- subset ( x , Height < 40 ) } f1 ( df1 , na.rm = FALSE ) but this does not seem to do anything; the rows with NA still end up disappearing