subset | 易学教程

Performing calculations by subsets of data in R

阅读更多关于 Performing calculations by subsets of data in R

问题 I want to perform calculations for each company number in the column PERMNO of my data frame, the summary of which can be seen here: > summary(companydataRETS) PERMNO RET Min. :10000 Min. :-0.971698 1st Qu.:32716 1st Qu.:-0.011905 Median :61735 Median : 0.000000 Mean :56788 Mean : 0.000799 3rd Qu.:80280 3rd Qu.: 0.010989 Max. :93436 Max. :19.000000 My solution so far was to create a variable with all possible company numbers compns <- companydataRETS[!duplicated(companydataRETS[,"PERMNO"]),

Select observations from a subset to create a new subset based on a large dataframe in R

阅读更多关于 Select observations from a subset to create a new subset based on a large dataframe in R

问题 I have a dataset (Purchase.df) that contains many columns and rows. The important variable names for this question are "Customer", "OrderDate", "DateRank" (which ranks the dates so I can find the smallest date) and "BrandName." Below is a very small sample of what I'm working with: (I'm new to this website, so I hope what I paste below works) Purchase.df<-structure(list(Customer = c(10071535L, 10071535L, 10071535L, 10071535L, 10071535L, 10071535L, 10071711L, 10071711L, 10071711L, 10071711L,

Make subset of array, based on values of two other arrays in Python

阅读更多关于 Make subset of array, based on values of two other arrays in Python

问题 I am using Python. How to make a subselection of a vector, based on the values of two other vectors with the same length? For example this three vectors c1 = np.array([1,9,3,5]) c2 = np.array([2,2,3,2]) c3 = np.array([2,3,2,3]) c2==2 array([ True, True, False, True], dtype=bool) c3==3 array([False, True, False, True], dtype=bool) I want to do something like this: elem = (c2==2 and c3==3) c1sel = c1[elem] But the first statement results in an error: Traceback (most recent call last): File "

How to pass/use string in [ to subset

阅读更多关于 How to pass/use string in [ to subset

问题 How to pass/use string in [ to subset e.g. array. I've been thinking about something like this (for 4 dims array): inputDims <- ",,'CCC'," outputArray[parse(text=inputDims)] Above doesn't work - how to achieve this? I am not interested in using logical vector (or matrix) inside [ - just string (in a form like it is in the example) if this is possible. 回答1: (This seems like a horrible hack. Having trouble seeing the value in proceeding along these lines but perhaps it will clarify what is

Extract/subset minute values from each hour

阅读更多关于 Extract/subset minute values from each hour

问题 My data frame contains date values in the format YYYY-MM-DD HH-MM-SS across 125000+ rows, broken down by the minute (each row represents a single minute). 1 2018-01-01 00:04:00 2 2018-01-01 00:05:00 3 2018-01-01 00:06:00 4 2018-01-01 00:07:00 5 2018-01-01 00:08:00 6 2018-01-01 00:09:00 ... 124998 2018-03-29 05:07:00 124999 2018-03-29 05:08:00 125000 2018-03-29 05:09:00 I want to subset the data by extracting all of the minute values within any given hour and saving the results into individual

how find all groups of subsets of set A? Set partitions in Python

阅读更多关于 how find all groups of subsets of set A? Set partitions in Python

问题 I want to find an algorithm that given a set A to find all groups of subsets that satisfy the following condition: x ∪ y ∪ .... z = A, where x, y, ... z ∈ Group and ∀ x,y ∈ Group: x ⊆ A, y ⊆ A, x ∩ y = ∅ = {} and ∀ x ∈ Group: x != ∅ Note: I hope to define it well, I'm not good with math symbols I made the following approach to search groups of two subsets only: from itertools import product, combinations def my_combos(A): subsets = [] for i in xrange(1, len(A)): subsets.append(list

Deleting rows from a data frame that are not present in another data frame in R [duplicate]

阅读更多关于 Deleting rows from a data frame that are not present in another data frame in R [duplicate]

问题 This question already has answers here : Find complement of a data frame (anti - join) (7 answers) How to join (merge) data frames (inner, outer, left, right) (13 answers) Closed 4 years ago . I'm new to R but from what I've been reading this one is a bit hard for me. I have two data frames, say DF1 and DF2, both of which have a variable of interest, say idFriends, and I want to create a new data frame where all the rows that do not appear in DF2 are deleted from DF1 based on the values of

selecting rows according to all covariates combinations of a different dataframe

阅读更多关于 selecting rows according to all covariates combinations of a different dataframe

问题 I am currently trying to figure out how to select all the rows of a long dataframe ( long ) that present the same x1 and x2 combinations characterizing another dataframe ( short ). The simplified data are: long <- read.table(text = " id_type x1 x2 1 0 0 1 0 1 1 1 0 1 1 1 2 0 0 2 0 1 2 1 0 2 1 1 3 0 0 3 0 1 3 1 0 3 1 1 4 0 0 4 0 1 4 1 0 4 1 1", header=TRUE) and short <- read.table(text = " x1 x2 0 0 0 1", header=TRUE) The expected output would be: id_type x1 x2 1 0 0 1 0 1 2 0 0 2 0 1 3 0 0 3

SAS - Keeping only observations with all variables

阅读更多关于 SAS - Keeping only observations with all variables

问题 I have a dataset of membership information, and I want to keep only the people who have been continuously enrolled for the entire year. There are 12 variables for each person, one for each month of the year with how many days during that month they were enrolled. Is there a way to make a subset of the data for just those with a value >1 for each of the month variables? Thanks! 回答1: SAS has various summary functions that might well be what you're looking for. See min() (minimum) in particular,

Issue in deleting supersets in Matlab

阅读更多关于 Issue in deleting supersets in Matlab

问题 i've a set of data consisting of sets i want to remove super sets for which subsets are present as follows: a{1} = [5] a{2} = [4 11 14] a{3} = [1] a{4} = [5 16] a{5} = [5] a{6} = [11 16] a{7} = [11] a{8} = [16] a{9} = [9 14 17] a{10} = [14] [ii, jj] = ndgrid(1:numel(a)); s = cellfun(@(x,y) all(ismember(x,y)), a(ii), a(jj)); s = triu(s,1); %// count each pair just once, and remove self-pairs similarity = a(~any(s,1)); celldisp(similarity) the result is as follows: a{1} = [5] a{2} = [4 11 14] a