subset | 易学教程

Subset multiple columns in R - more elegant code?

阅读更多关于 Subset multiple columns in R - more elegant code?

I am subsetting a dataframe according to multiple criteria across several columns. I am choosing the rows in the dataframe that contain any one of several values defined in the vector "criteria" in any one of three different columns. I have some code that works, but wonder what other (more elegant?) ways there are to do this. Here is what I've done: criteria <-c(1:10) subset1 <-subset(data, data[, "Col1"] %in% criteria | data[, "Col2"] %in% criteria | data[, "Col3"] %in% criteria) Suggestions warmly welcomed. (I am an R beginner, so very simple explanations about what you are suggesting are

In repeated measures data, how to subset to select matched cases and controls?

阅读更多关于 In repeated measures data, how to subset to select matched cases and controls?

问题 I have a set of data clustered by family, research question is do 2 people in the same family with different characteristic x have the same binary (yes/no) outcome y. In some families, all members are "yes" for y. In other families, some are "yes" and some are "no" for y. I want to get only the families with discordant outcome statuses. I am guessing the code will be some sort of conditional logic statements but can't quite figure it out yet... In the sample data below, for example, I only

Pandas best way to subset a dataframe inplace, using a mask

阅读更多关于 Pandas best way to subset a dataframe inplace, using a mask

I have a pandas dataset that I want to downsize (remove all values under x). The mask is df[my_column] > 50 I would typically just use df = df[mask] , but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope). What is the best way to subset a dataset inplace? I was thinking of something along the lines of df.drop(df.loc[mask].index, inplace = True) Is there a better way to do this, or any situation where this won't work at all? You are missing the inplace parameter : df.drop(df[df.my_column < 50]

R: from a vector, list all subsets of elements so their sum just passes a value

阅读更多关于 R: from a vector, list all subsets of elements so their sum just passes a value

问题 Sorry in advance if the answer (1) is trivial; or (2) out there but I haven't been able to solve this issue or find and answer online. Any pointers will be much appreciated! I am in need of a piece of code that can run through a vector and return all possible subsets of elements whose cumulative sum passes a threshold value. Note that I do not want only the subsets that give me exactly the threshold. The cumulative sum can be above the threshold, as long as the algorithm stops adding an extra

Subset dataframe where date is within x days of a vector of dates in R

阅读更多关于 Subset dataframe where date is within x days of a vector of dates in R

I have a vector of dates e.g. dates <- c('2013-01-01', '2013-04-02', '2013-06-10', '2013-09-30') And a dataframe which contains a date column e.g. df <- data.frame( 'date' = c('2013-01-04', '2013-01-22', '2013-10-01', '2013-10-10'), 'a' = c(1,2,3,4), 'b' = c('a', 'b', 'c', 'd') ) And I would would like to subset the dataframe so it only contains rows where the date is less than 5 days after any of the dates in the 'dates' vector. i.e. The initial dataframe looks like this date a b 2013-01-04 1 a 2013-01-22 2 b 2013-10-01 3 c 2013-10-10 4 d After the query I would only be left with the first

select maximum row value by group

阅读更多关于 select maximum row value by group

问题 I've been trying to do this with my data by looking at other posts, but I keep getting an error. My data new looks like this: id year name gdp 1 1980 Jamie 45 1 1981 Jamie 60 1 1982 Jamie 70 2 1990 Kate 40 2 1991 Kate 25 2 1992 Kate 67 3 1994 Joe 35 3 1995 Joe 78 3 1996 Joe 90 I want to select the row with the highest year value by id. So the wanted output is: id year name gdp 1 1982 Jamie 70 2 1992 Kate 67 3 1996 Joe 90 From Selecting Rows which contain daily max value in R I tried the

Why subset doesn't mind missing subset argument for dataframes?

阅读更多关于 Why subset doesn't mind missing subset argument for dataframes?

问题 Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from. Let numbers <- c(1, 2, 3) frame <- as.data.frame(numbers) If I type subset(numbers, ) (so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should): Error in subset.default(numbers, ) : argument "subset" is missing, with no default However when I type subset(frame,) (so the same thing with a data.frame

Finding the product of each of the (n-1) subsets of a given array

阅读更多关于 Finding the product of each of the (n-1) subsets of a given array

I'm sorry for deleting the original question, here it is: We have a bag or an array of n integers, we need to find the product of each of the (n-1) subsets. e.g: S = {1, 0, 3, 6} ps[1] = 0*3*6 = 0; ps[2] = 1*3*6 = 18; etc. After discussions, we need to take care of the three cases and they are illustrated in the following: 1. S is a set (contains one zero element) for i=1 to n if s[i]=0 sp[i] = s[1] * s[2] * ...* s[i-1] * s[i+1] *.....*s[n] else sp[i] = 0; 2. S is a bag (contains more than one zero element) for i=1 to n sp[i] = 0; 3. S is a set (contains no zero elements) product = 1 for i=1

Iterate through different subset of size k

阅读更多关于 Iterate through different subset of size k

问题 I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets. e.g. array = {1,2,2,3,3,3,3}, n = 7, k = 2 then the subsets I want to iterate over (each once) are: {1,2},{1,3},{2,2},{2,3},{3,3} What is an efficient algorithm for doing this? Is a recursive approach the most efficient/elegant? In case you have a language-specific answer, I'm using C++. 回答1: The same (or almost the same)

Subset data with dynamic conditions in R

阅读更多关于 Subset data with dynamic conditions in R

问题 I have a dataset of 2500 rows which are all bank loans. Each bank loan has an outstanding amount and collateral type. (Real estate, Machine tools.. etc) I need to draw a random selection out of this dataset where for example the sum of outstanding amount = 2.5Million +-5% and maximum 25% loans with the same asset class. I found the function optim, but this asks for a function and looks to be constructed for optimization a portfolio of stocks, which is much more complex. I would say that there