subset

How can I subset rows in a data frame in R based on a vector of values?

三世轮回 提交于 2019-11-27 06:25:05
I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a report. (Don't worry, this data isn't being permanently deleted!) I have read the following: Selecting columns in R data frame based on those *not* in a vector http://www.ats.ucla.edu/stat/r/faq/subset_R.htm How to combine multiple conditions to subset a data-frame using "OR"? But I'm still not able to get this to work right. Here's my code: bg2011missingFromBeg <- setdiff(x=eg2011$ID, y=bg2011$ID)

subset() a factor by its number of observation

醉酒当歌 提交于 2019-11-27 06:22:43
问题 I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation? NAME CLASS COLOR VALUE antonio B YELLOW 5 antonio B BLUE 8 antonio B BLUE 7 antonio B BLUE 12 luca C YELLOW 99 luca B YELLOW 87 luca B YELLOW 98 giovanni A BLUE 48 I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain: NAME CLASS COLOR VALUE antonio B BLUE mean because

Generate a powerset of a set without keeping a stack in Erlang or Ruby

佐手、 提交于 2019-11-27 06:22:13
问题 I would like to generate a powerset of a rather big set (about 30-50 elements) and I know that it takes 2^n to store the powerset. Is it possible to generate one subset at a time? I.e. generate a powerset of a set with iterations, saving each generated subset to disk/database, removing it from the stack/memory and only then continuing to generate other subsets? Unfortunately I have failed to modify Erlang and Ruby examples to my needs. 回答1: Edit: Added the enumerator (as @Jörg W Mittag) if no

Looping through t.tests for data frame subsets in r

拥有回忆 提交于 2019-11-27 05:31:30
I have a data frame 'math.numeric' with 32 variables. Each row represents a student and each variable is an attribute. The students have been put into 5 groups based on their final grade. The data looks as follows: head(math.numeric) school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason ... group 1 1 18 2 1 1 4 4 1 5 1 2 1 1 17 2 1 2 1 1 1 3 1 2 1 1 15 2 2 2 1 1 1 3 3 3 1 1 15 2 1 2 4 2 2 4 2 4 1 1 16 2 1 2 3 3 3 3 2 3 1 2 16 2 2 2 4 3 4 3 4 4 I am performing t-tests on each variable for group 1 vs. all the other groups to identify significantly different attributes with this group

Subsetting data.table using variables with same name as column

感情迁移 提交于 2019-11-27 05:01:31
I want to subset a data.table using a variable which has the same name as the column which leeds to some problems: dt <- data.table(a=sample(c('a', 'b', 'c'), 20, replace=TRUE), b=sample(c('a', 'b', 'c'), 20, replace=TRUE), c=sample(20), key=c('a', 'b')) evn <- environment() a <- 'b' dt[a == a] #Expected Result dt[a == 'b'] I came across this possible solution : env <- environment() dt[a == get('a',env)] But it is as unhandy as: this.a = a dt[a == this.a] So is there another elegant solution? For now, a temporary solution could be, `..` <- function (..., .env = globalenv()) { get(deparse

Split/subset a data frame by factors in one column [duplicate]

时光怂恿深爱的人放手 提交于 2019-11-27 04:03:29
This question already has an answer here: Split data.frame based on levels of a factor into new data.frames 1 answer My data is like this (for example): ID Rate State 1 24 AL 2 35 MN 3 46 FL 4 34 AL 5 78 MN 6 99 FL Data: structure(list(ID = 1:6, Rate = c(24L, 35L, 46L, 34L, 78L, 99L), State = structure(c(1L, 3L, 2L, 1L, 3L, 2L), .Label = c("AL","FL", "MN"), class = "factor")), .Names = c("ID", "Rate", "State"), class = "data.frame", row.names = c(NA, -6L)) I want to split the data by state and I want to get 3 data sets like below: data set 1 ID Rate State 1 24 AL 4 34 AL data set 2 ID Rate

Pull nth Day of Month in XTS in R

余生长醉 提交于 2019-11-27 03:40:26
问题 My questions is closely related to the one asked here: Pull Return from first business day of the month from XTS object using R. Instead of extracting the first day of each month, I want to extract, say the 10th data point of each month. How can I do this? 回答1: Using the same example data from the question you've linked to, you can do some basic subsetting. Here's the sample data: library(xts) data(sample_matrix) x <- as.xts(sample_matrix) Here's the subsetting: x[format(index(x), "%d") ==

Pass subset argument through a function to subset

扶醉桌前 提交于 2019-11-27 02:57:18
问题 I would like to have a function which calls subset , and passes on a subset argument: df <- data.frame(abc=c("A","A","B","B"),value=1:4) subset(df,abc=="A") ## works of course: # abc value #1 A 1 #2 A 2 mysubset <- function(df,ssubset) subset(df,ssubset) mysubset(df,abc=="A") ## Throws an error # Error in eval(expr, envir, enclos) : object 'abc' not found mysubset2 <- function(df,ssubset) subset(df,eval(ssubset)) mysubset2(df,expression(abc=="A")) ## Works, but needs expression I tried with

Return data subset time frames within another timeframes?

我是研究僧i 提交于 2019-11-27 02:38:19
There are very nifty ways of subsetting xts objects. For example, one can get all the data for all years, months, days but being strictly between 9:30 AM and 4 PM by doing: my_xts["T09:30/T16:00"] Or you can get all the observations between two dates by doing: my_xts["2012-01-01/2012-03-31"] Or all the dates before/after a certain date by doing: my_xts["/2011"] # from start of data until end of 2011 my_xts["2011/"] # from 2011 until the end of the data How can I get all the data for only certain months for all years or only certain days for all months and years? Do any other subsetting tricks

How to subset data in R without losing NA rows?

谁都会走 提交于 2019-11-27 02:02:50
I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA. I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis. df2 <- subset ( df1 , Height < 40 ) However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm f1 <- function ( x , na.rm = FALSE ) { df2 <- subset ( x , Height < 40 ) } f1 ( df1 , na.rm = FALSE ) but this does not seem to do anything; the rows with NA still end up disappearing