subset | 易学教程

Finding all possible combination of all possible subsets of lists

阅读更多关于 Finding all possible combination of all possible subsets of lists

问题 I am trying to find all combinations of subsets of from a list, where each element is used only once. To clarify what I mean, given an example list of: [1, 2, 3, 4] I can generate, the following combinations: [(1), (2), (3), (1, 2), (1, 3), (2, 3), (1, 2, 3)] (this should be exhaustive!) Then I can generate combinations of these combinations: [[(1), (2), (3)], [(1, 2), (3)], [(1, 3), (2)], [(1), (2, 3)],[(1, 2, 3)],... (for many)] The main point is that I am only allowed to use each element

how to efficiently subset a dataframe into several chunks to be passed to a list of lists

阅读更多关于 how to efficiently subset a dataframe into several chunks to be passed to a list of lists

问题 I would appreciate any help to efficiently subset a data frame into several chunks to be passed to a list of lists based on the variable imput . My code below works for a few subsets, but I have 100 subsets to create and the code becomes too much and difficult to handle. Therefore, I need a more efficient approach which accomplishes the same outcome without too much code. The approach imputation_groups <- split(dat, dat$imput) discussed here allows me to split my data into a list of several

Pandas_Pivot table - making additional columns from division of merged columns

阅读更多关于 Pandas_Pivot table - making additional columns from division of merged columns

问题 I'm trying to run the following function def make_europe_view(data): data['% Rev'] = data.GrossRevenue_GBP/data.GrossRevenue_GBP.sum() tmean = lambda x :stats.trim_mean(x, 0.1) pivot = pd.pivot_table(data[(data['New_category_ID'] != 0)&(data['YYYY'] == 2016)], index = 'New_category', values=['GrossRevenue_GBP','MOVC_GBP','PM_GBP', '% Rev'], aggfunc= {'MOVC_GBP':tmean,'PM_GBP':tmean,'GrossRevenue_GBP':[np.sum,tmean],'% Rev':np.sum }) pivot['% PM'] = pivot['PM_GBP']/pivot[('GrossRevenue_GBP')][

Subset a data frame for each factor level [duplicate]

阅读更多关于 Subset a data frame for each factor level [duplicate]

问题 This question already has an answer here : Split/subset a data frame by factors in one column [duplicate] (1 answer) Closed 3 years ago . Given the dataset red_wine_data below, how can I create the list l which contains the following four subsetted data frames for all values in unique(red_wine_data$condition) ? I'm looking for a flexible and dynamic solution that produces a result similar to these hard-coded commands, but that will work for any similar data frame even if the factor levels

mongodb - retrieve array subset

阅读更多关于 mongodb - retrieve array subset

问题 what seemed a simple task, came to be a challenge for me. I have the following mongodb structure: { (...) "services": { "TCP80": { "data": [{ "status": 1, "delay": 3.87, "ts": 1308056460 },{ "status": 1, "delay": 2.83, "ts": 1308058080 },{ "status": 1, "delay": 5.77, "ts": 1308060720 }] } }} Now, the following query returns whole document: { 'services.TCP80.data.ts':{$gt:1308067020} } I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of

Conditional subsetting by POSIXct interval and another field containing interval

阅读更多关于 Conditional subsetting by POSIXct interval and another field containing interval

问题 Given a dataset Dat where I have species (SP), Area (AR), and Time (TM) (in POSIXct). I want to subset the data for individuals that were present with Species A, within a half hour prior and after it was recorded, and within the same area, including two adjacent areas (+ and - 1). For example, if species A was present at 1:00 on area 4, I wish to subset all species present from 12:30 to 1:30 in the same day in areas 3,4 and 5. As an example: SP TM AR B 1-jan-03 07:22 1 F 1-jan-03 09:22 4 A 1

Using lapply and which to subset dataframe by both characteristic and fuction

阅读更多关于 Using lapply and which to subset dataframe by both characteristic and fuction

问题 I have a dataframe with 5 dimensions of data that looks like this: > dim(alldata) [1] 162 6 > head(alldata) value layer Kmultiplier Resolution Season Variable 1: 0.01308008 b .01K 1km Baseflow Evapotranspiration 2: 0.03974779 b .01K 1km Peak Flow Evapotranspiration 3: 0.02396524 b .01K 1km Summer Flow Evapotranspiration 4: -0.15670996 b .01K 1km Baseflow Discharge 5: 0.06774948 b .01K 1km Peak Flow Discharge 6: -0.04138313 b .01K 1km Summer Flow Discharge What I'd like to do is get the mean

R data.table join/ subsetting/ match by group and by a condition

阅读更多关于 R data.table join/ subsetting/ match by group and by a condition

问题 I am trying to subset/ match data by groups from 2 data.tables and cannot figure out how do this is in R. I have the following data.table that has a City_ID and a time stamp (column name=Time). Library(data.table) timetable <- data.table(City_ID=c("12","9"), Time=c("12-29-2013-22:05:03","12-29-2013-11:59:00")) I have a second data.table with several observation for cities and time stamps (plus additional data). The table looks like this: DT = data.table(City_ID =c("12","12","12","9","9","9"),

regression on subsets for unique factor combinations using lm

阅读更多关于 regression on subsets for unique factor combinations using lm

问题 I would like to automate a simple multiple regression for the subsets defined by the unique combinations of the grouping variables. I have a dataframe with several grouping variables df1[,1:6] and some independent variables df1[,8:10] and a response df1[,7]. This is an excerpt from the data. structure(list(Surface = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NiAu", "Sn"), class = "factor"), Supplier = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L), .Label =

Subset/filter in dplyr chain with ggplot2

阅读更多关于 Subset/filter in dplyr chain with ggplot2

问题 I'd like to make a slopegraph, along the lines (no pun intended) of this. Ideally, I'd like to do it all in a dplyr-style chain, but I hit a snag when I try to subset the data to add specific geom_text labels. Here's a toy example: # make tbl: df <- tibble( area = rep(c("Health", "Education"), 6), sub_area = rep(c("Staff", "Projects", "Activities"), 4), year = c(rep(2016, 6), rep(2017, 6)), value = rep(c(15000, 12000, 18000), 4) ) %>% arrange(area) # plot: df %>% filter(area == "Health") %>%