subset | 易学教程

subseting dataframe conditions on factor(binary) column(vector in r language)

阅读更多关于 subseting dataframe conditions on factor(binary) column(vector in r language)

问题 i have a sequence of 1/0's indicating if patient is in remission or not, assume the records of remission or not were taken at discrete times, how can i check the markov property for each patient, then summarize the findings, that is the assumption that the probability of remission for any patient at any time depends only if the patient had remission the last time/not remission last time(same as thing as saying probability of remission for any patient at any time depends only if the patient

subseting dataframe conditions on factor(binary) column(vector in r language)

阅读更多关于 subseting dataframe conditions on factor(binary) column(vector in r language)

Generate subsets of length n

阅读更多关于 Generate subsets of length n

问题 Given a Set, generate all subsets of length n. Sample input: set = new Set(['a', 'b', 'c']), length = 2 Sample output: Set {'a', 'b'}, Set {'a', 'c'}, Set {'b', 'c'} How to do this efficiently, without converting the Set to an Array ? I already have a good solution for arrays as input: function* subsets(array, length, start = 0) { if (start >= array.length || length < 1) { yield new Set(); } else { while (start <= array.length - length) { let first = array[start]; for (subset of subsets(array

subsetting based on number of observations in a factor variable

阅读更多关于 subsetting based on number of observations in a factor variable

问题 how do you subset based on the number of observations of the levels of a factor variable? I have a dataset with 1,000,000 rows and nearly 3000 levels, and I want to subset out the levels with less say 200 observations. data <- read.csv("~/Dropbox/Shared/data.csv", sep=";") summary(as.factor(data$factor) 10001 10002 10003 10004 10005 10006 10007 10009 10010 10011 10012 10013 10014 10016 10017 10018 10019 10020 414 741 2202 205 159 591 194 678 581 774 778 738 1133 997 381 157 522 6 10021 10022

Subsetting a data frame to the rows not appearing in another data frame

阅读更多关于 Subsetting a data frame to the rows not appearing in another data frame

问题 I have a data frame A with observations Var1 Var2 Var3 1 3 4 2 5 6 4 5 7 4 5 8 6 7 9 and data frame B with observations Var1 Var2 Var3 1 3 4 2 5 6 which is basically a subset of A. Now I want to select observations in A NOT in B, i.e, the data frame C with observations Var1 Var2 Var3 4 5 7 4 5 8 6 7 9 Is there a way I can do this in R? The data frames I've used are just arbitrary data. 回答1: dplyr has a nice anti_join function that does exactly that: > library(dplyr) > anti_join(A, B) Joining

R Subsetting a data.frame when 2 columns have different values

阅读更多关于 R Subsetting a data.frame when 2 columns have different values

问题 I have a data.frame like this: Type1 rep1 Type2 rep2 stat p.value 17 DqSAD 1 rnzDqSAD 9 3.7946 0.0101 18 DqSAD 1 DqSAD 10 -0.5278 0.6428 19 DqSAD 1 rnzDqSAD 10 0.4111 0.2231 20 rnzDqSAD 1 DqSAD 2 -0.3111 0.5085 21 rnzDqSAD 1 rnzDqSAD 2 -0.8904 0.9080 and I would like to subset it when the columns Type1 & Type 2 have different values. I mean in an automatic way, not explicitly checking for this particular values like Type1=="DqSAD" & Type2=="rnzDqSAD" I remember this could be done with sql,

R: Subsetting on increasing value to max excluding the decreasing

阅读更多关于 R: Subsetting on increasing value to max excluding the decreasing

问题 I have a number of trials where one variable increases to a max of interest then decreases back to a starting point. How would I go about just retaining the observations with the increasing values to max. Thanks. For example Trial A B C 1 2 4 1 1 4 3 2 1 3 7 3 1 3 3 2 1 4 1 1 2 4 1 1 2 6 2 2 2 3 1 3 2 1 1 2 2 7 3 1 ... So we would check max on C and retain as follows, Trial A B C 1 2 4 1 1 4 3 2 1 3 7 3 2 4 1 1 2 6 2 2 2 3 1 3 ... Ultimately I'll have a low cut off value as well as varying

Subsetting lists via logical index vectors

阅读更多关于 Subsetting lists via logical index vectors

问题 I have a complex list and need to select a subset from it, based on the value of a boolean element (I need records with hidden value equal to FALSE ). I've tried the following code, based on index vectors , but it fails (as shown at the end of this output): startups <- data$startups[data$startups$hidden == FALSE] Or, alternatively: startups <- data$startups[!as.logical(data$startups$hidden)] Interactive R session proves that the data is there: Browse[1]> str(data$startups, list.len=3) List of

Dynamic Programing approach for a subset sum

阅读更多关于 Dynamic Programing approach for a subset sum

问题 Given the following Input 10 4 3 5 5 7 Where 10 = Total Score 4 = 4 players 3 = Score by player 1 5 = Score by player 2 5 = Score by player 3 7 = Score by player 4 I am to print players who's combine score adds to total so output can be 1 4 because player 1 + player 4 score = 3 + 7 -> 10 or output can be 2 3 because player 2 + player 3 score = 5 + 5 -> 10 So it is quite similar to a subset sum problem. I am relatively new to dynamic programing but after getting help on stackoverflow and

Access entries in pandas data frame using a list of indices

阅读更多关于 Access entries in pandas data frame using a list of indices

问题 I facing the issue that I need only a subset of a my original dataframe that is distributed over different rows and columns. E.g.: # My Original dataframe import pandas as pd dfTest = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]]) Output: 0 1 2 0 1 2 3 1 4 5 6 2 7 8 9 I can provide a list with rows and column indices where my desired values are located: array_indices = [[0,2],[1,0],[2,1]] My desired output is a series: 3 4 8 Can anyone help? 回答1: Use pd.DataFrame.lookup dfTest.lookup(*zip(*array