subset | 易学教程

Number of distinct sums from non-empty groupings of (possibly very large) lists

阅读更多关于 Number of distinct sums from non-empty groupings of (possibly very large) lists

Assume that you are given a set of coin types (maximum 20 distinct types) and from each type you have maximum 10^5 instances, such that the total number of all the coins in your list is maximum 10^6. What is the number of distinct sums you can make from non-empty groupings of these coins. for example, you are given the following lists: coins=[10, 50, 100] quantity=[1, 2, 1] which means you have 1 coin of 10, and 2 coins of 50, and 1 coin of 100. Now the output should be possibleSums(coins, quantity) = 9. Here are all the possible sums: 50 = 50; 10 + 50 = 60; 50 + 100 = 150; 10 + 50 + 100 = 160

How to find all subsets of a multiset that are in a given set?

阅读更多关于 How to find all subsets of a multiset that are in a given set?

Say I have a set D of multisets: D = { {d, g, o}, {a, e, t, t}, {a, m, t}, } Given a multiset M , like M = {a, a, m, t} I would like an algorithm f to give me all elements of D that are subsets (or more precisely, "submultisets") of M : f = {{a, m, t}} If we do only one such query, scanning over all elements of D (in O(#D) time) is clearly optimal. But if we want to answer many such queries for the same D and different M , we might be able to make it faster by preprocessing D into some smarter data structure. We could toss all of D into a hashtable and iterate over all possible subsets of M ,

R: How to apply moving averages to subset of columns in a data frame?

阅读更多关于 R: How to apply moving averages to subset of columns in a data frame?

I have a dataframe (training.set) that is 150 observations of 83 variables. I want to transform 82 of those columns with some moving averages. The problem is the results end up only being 150 numeric values (i.e. 1 column). How would I apply the moving average function across each column individually in the data and keep the 83rd column unchanged? I feel like this is super simple, but I can't find a solution. My current code # apply moving average on training.set data to 82 of 83 rows library(TTR) #load TTR library for SMA functions ts.sma <- SMA(training.set[,1:82], n = 10) ts.sma Thanks for

Elegant way to drop rare factor levels from data frame

阅读更多关于 Elegant way to drop rare factor levels from data frame

问题 I want to subset a dataframe by factor. I only want to retain factor levels above a certain frequency. df <- data.frame(factor = c(rep("a",5),rep("b",5),rep("c",2)), variable = rnorm(12)) This code creates data frame: factor variable 1 a -1.55902013 2 a 0.22355431 3 a -1.52195456 4 a -0.32842689 5 a 0.85650212 6 b 0.00962240 7 b -0.06621508 8 b -1.41347823 9 b 0.08969098 10 b 1.31565582 11 c -1.26141417 12 c -0.33364069 And I want to drop factor levels which repeated less than 5 times. I

Get all possible subsets - preserving order

阅读更多关于 Get all possible subsets - preserving order

This is a follow up to this question: Generate all "unique" subsets of a set (not a powerset) My problem is the same, but I think there might be a more optimized solution when order of items in the new subsets and across the subsets needs to be preserved. Example: [1, 2, 3] Would result in: [[1], [2], [3]] [[1, 2], [3]] [[1], [2, 3]] [[1, 2, 3]] Niklas B. I've already answered this question for Python , so I quickly ported my solution over to Ruby: def spannings(lst) return enum_for(:spannings, lst) unless block_given? yield [lst] (1...lst.size).each do |i| spannings(lst[i..-1]) do |rest|

Subset data table without using <-

阅读更多关于 Subset data table without using

问题 I want to subset some rows of a data table. Like this: # load data data("mtcars") # convert to data table setDT(mtcars,keep.rownames = T) # Subset data mtcars <- mtcars[like(rn,"Mer"),] # or mtcars <- mtcars[mpg > 20,] However, I'm working with a huge data set and I wanted to avoid using <- , which is not memory efficient because it makes a copy of the data. Is this correct? Is it possible to update the filtered data without <- ? 回答1: What you are asking would be delete rows by reference . It

subset an additional variable and append it to the previous one in R

阅读更多关于 subset an additional variable and append it to the previous one in R

I have a function that subset s what (i.e., a variable) user requests out of this dataset . The function works perfect. But I was wondering if there might be a way that in addition to what user requests, the function always subset entries that contain control == TRUE and append those to what the user has requested. For example, suppose user wants to subset entries with type == 4 . In this dataset , there are 4 such entries. As reproducible code and data below show, this is done easily BUT there also are 4 other entries for which control == TRUE , how can function find and append these 4 other

PySpark: Search For substrings in text and subset dataframe

阅读更多关于 PySpark: Search For substrings in text and subset dataframe

I am brand new to pyspark and want to translate my existing pandas / python code to PySpark . I want to subset my dataframe so that only rows that contain specific key words I'm looking for in 'original_problem' field is returned. Below is the Python code I tried in PySpark: def pilot_discrep(input_file): df = input_file searchfor = ['cat', 'dog', 'frog', 'fleece'] df = df[df['original_problem'].str.contains('|'.join(searchfor))] return df When I try to run the above, I get the following error: AnalysisException: u"Can't extract value from original_problem#207: need struct type but got string;

Using attributes of `ftable` for extracting data

阅读更多关于 Using attributes of `ftable` for extracting data

问题 I sometimes use the ftable function purely for its presentation of hierarchical categories. However, sometimes, when the table is large, I would like to further subset the table before using it. Let's say we're starting with: mytable <- ftable(Titanic, row.vars = 1:3) mytable ## Survived No Yes ## Class Sex Age ## 1st Male Child 0 5 ## Adult 118 57 ## Female Child 0 1 ## Adult 4 140 ## 2nd Male Child 0 11 ## Adult 154 14 ## Female Child 0 13 ## Adult 13 80 ## 3rd Male Child 35 13 ## Adult 387

Mapping MongoDB documents to case class with types but without embedded documents

阅读更多关于 Mapping MongoDB documents to case class with types but without embedded documents

Subset looks like an interesting, thin MongoDB wrapper. In one of the examples given, there are Tweets and Users. However, User is a subdocument of Tweet . In classical SQL, this would be normalized into two separate tables with a foreign key from Tweet to User. In MongoDB, this wouldn't necessitate a DBRef , storing the user's ObjectId would be sufficient. Both in Subset and Salat this would result in these case classes: case class Tweet(_id: ObjectId, content: String, userId: ObjectId) case class User(_id: ObjectId, name: String) So there's no guarantee that the ObjectId in Tweet actually