subset

Sample a single row, per column, within a subset of a data frame in R, while following conditions

社会主义新天地 提交于 2020-01-07 03:03:21
问题 As an example of my data, I have GROUP 1 with three rows of data, and GROUP 2 with two rows of data, in a data frame: GROUP VARIABLE 1 VARIABLE 2 VARIABLE 3 1 2 6 5 1 4 NA 1 1 NA 3 8 2 1 NA 2 2 9 NA NA I would like to sample a single variable, per column from GROUP 1, to make a new row representing GROUP 1. I do not want to sample one single and complete row from GROUP 1, but rather the sampling needs to occur individually for each column. I would like to do the same for GROUP 2. Also, the

Merge two regression prediction models (with subsets of a data frame) back into the data frame (one column)

拈花ヽ惹草 提交于 2020-01-06 17:46:51
问题 I am building atop a similar question asked and answered on SO one year ago. It relates to this post: how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame I will use the same data as was used there, but with a new column. I create the data : dat = read.table(text = " cats birds wolfs snakes trees 0 3 8 7 2 1 3 8 7 3 1 1 2 3 2 0 1 2 3 1 0 1 2 3 2 1 6 1 1 3 0 6 1 1 1 1 6 1 1 1 " ,header = TRUE) Model the number of wolves, using

R - subset data frame - check if value lies in range

荒凉一梦 提交于 2020-01-06 05:45:07
问题 I have the following two data frames d1 <- data.frame(chr = c("chr1","chr2","chr2"), pos = c(11, 15,21), type = c("type1","type2","type1")) > d1 chr pos type 1 chr1 11 type1 2 chr2 15 type2 3 chr2 21 type1 d2 <- data.frame(chr = c("chr1","chr2","chr4"), start = c(10, 15,30), stop = c(13,20,40)) > d2 chr start stop 1 chr1 10 13 2 chr2 15 20 3 chr4 30 40 I want to subset d1 on two conditions: keep all lines where 'type' == "type1" (I know how to do this) keep all lines where 'chr' matches any

Subsetting based on observations in a month

你说的曾经没有我的故事 提交于 2020-01-05 11:04:29
问题 I'm trying to subset some data and am stuck on the last part of cleaning. What I need to do is calculate the number of observations for each individual (indivID) in months (June, July, and August) and return a percentage for each without missing data and then keep those observations that are over 75%. I was able to create a nested for loop, but it took probably 6 hours to process today. I would like to be able to take advantage of parallel computer by using ddply, or another function, but an

Non-standard evaluation of subset argument with mapply in R

吃可爱长大的小学妹 提交于 2020-01-05 09:47:29
问题 I can not use the subset argument of any function with mapply . The following calls fail with the subset argument, but they work without: mapply(ftable, formula = list(wool ~ breaks, wool + tension ~ breaks), subset = list(breaks < 15, breaks < 20), MoreArgs = list(data = warpbreaks)) # Error in mapply(ftable, formula = list(wool ~ breaks, wool + tension ~ : # object 'breaks' not found mapply(xtabs, formula = list(~ wool, ~ wool + tension), subset = list(breaks < 15, breaks < 20), MoreArgs =

Non-standard evaluation of subset argument with mapply in R

 ̄綄美尐妖づ 提交于 2020-01-05 09:47:14
问题 I can not use the subset argument of any function with mapply . The following calls fail with the subset argument, but they work without: mapply(ftable, formula = list(wool ~ breaks, wool + tension ~ breaks), subset = list(breaks < 15, breaks < 20), MoreArgs = list(data = warpbreaks)) # Error in mapply(ftable, formula = list(wool ~ breaks, wool + tension ~ : # object 'breaks' not found mapply(xtabs, formula = list(~ wool, ~ wool + tension), subset = list(breaks < 15, breaks < 20), MoreArgs =

Subset an array for the pairs of indices in r

放肆的年华 提交于 2020-01-05 07:20:12
问题 Although I have searched for, I could not find a straightforward answer to my question. Suppose I have an array: vector1 <- c(5,9,3) vector2 <- c(10,11,12,13,14,15) result <- array(c(vector1,vector2),dim = c(3,3,2)) Now, I want to subset this array in such a way as to get elements from certain rows and columns. For example: result[1,3,1:2] result[3,1,1:2] since I have many indices, they are sotred in rowind=c(1,3) colind=c(3,1) For subsetting, I have tried to use vectors of rows and columns

subset of a data frame in R

陌路散爱 提交于 2020-01-05 05:35:13
问题 I have 2 data frames df2 and DF . > DF date tickers 1 2000-01-01 B 2 2000-01-01 GOOG 3 2000-01-01 V 4 2000-01-01 YHOO 5 2000-01-02 XOM > df2 date tickers quantities 1 2000-01-01 BB 11 2 2000-01-01 XOM 23 3 2000-01-01 GOOG 42 4 2000-01-01 YHOO 21 5 2000-01-01 V 2112 6 2000-01-01 B 13 7 2000-01-02 XOM 24 8 2000-01-02 BB 422 i need the values from df2 those are present in DF . That means i require the following output: 3 2000-01-01 GOOG 42 4 2000-01-01 YHOO 21 5 2000-01-01 V 2112 6 2000-01-01 B

Generate all possible permutations of subsets containing all the element of a set

二次信任 提交于 2020-01-04 08:24:40
问题 Let S(w) be a set of words. I want to generate all the possible n-combination of subsets s so that the union of those subsets are always equal to S(w). So you have a set (a, b, c, d, e) and you wan't all the 3-combinations: ((a, b, c), (d), (e)) ((a, b), (c, d), (e)) ((a), (b, c, d), (e)) ((a), (b, c), (d, e)) etc ... For each combination you have 3 set and the union of those set is the original set. No empty set, no missing element. There must be a way to do that using itertools.combination

Subsetting a dataframe based on values in another dataframe

倾然丶 夕夏残阳落幕 提交于 2020-01-03 20:09:20
问题 sorry an absolute beginner so have some very basic questions! I have a very large data set that lists individual transactions by a household. Example is below. # hh_id trans_type transaction_value # 1 hh1 food 4 # 2 hh1 water 5 # 3 hh1 transport 4 # 4 hh2 water 3 # 5 hh3 transport 1 # 6 hh3 food 10 # 7 hh4 food 5 # 8 hh4 transport 15 # 9 hh4 water 10 I want to to create a new data frame that has all transactions listed for ONLY the households that have transactions in the "water" category.