subset

Number of distinct sums from non-empty groupings of (possibly very large) lists

穿精又带淫゛_ 提交于 2019-12-04 22:37:08
Assume that you are given a set of coin types (maximum 20 distinct types) and from each type you have maximum 10^5 instances, such that the total number of all the coins in your list is maximum 10^6. What is the number of distinct sums you can make from non-empty groupings of these coins. for example, you are given the following lists: coins=[10, 50, 100] quantity=[1, 2, 1] which means you have 1 coin of 10, and 2 coins of 50, and 1 coin of 100. Now the output should be possibleSums(coins, quantity) = 9. Here are all the possible sums: 50 = 50; 10 + 50 = 60; 50 + 100 = 150; 10 + 50 + 100 = 160

How to find all subsets of a multiset that are in a given set?

六月ゝ 毕业季﹏ 提交于 2019-12-04 19:32:41
Say I have a set D of multisets: D = { {d, g, o}, {a, e, t, t}, {a, m, t}, } Given a multiset M , like M = {a, a, m, t} I would like an algorithm f to give me all elements of D that are subsets (or more precisely, "submultisets") of M : f = {{a, m, t}} If we do only one such query, scanning over all elements of D (in O(#D) time) is clearly optimal. But if we want to answer many such queries for the same D and different M , we might be able to make it faster by preprocessing D into some smarter data structure. We could toss all of D into a hashtable and iterate over all possible subsets of M ,

R: How to apply moving averages to subset of columns in a data frame?

自闭症网瘾萝莉.ら 提交于 2019-12-04 19:21:45
I have a dataframe (training.set) that is 150 observations of 83 variables. I want to transform 82 of those columns with some moving averages. The problem is the results end up only being 150 numeric values (i.e. 1 column). How would I apply the moving average function across each column individually in the data and keep the 83rd column unchanged? I feel like this is super simple, but I can't find a solution. My current code # apply moving average on training.set data to 82 of 83 rows library(TTR) #load TTR library for SMA functions ts.sma <- SMA(training.set[,1:82], n = 10) ts.sma Thanks for

Elegant way to drop rare factor levels from data frame

泄露秘密 提交于 2019-12-04 19:07:20
问题 I want to subset a dataframe by factor. I only want to retain factor levels above a certain frequency. df <- data.frame(factor = c(rep("a",5),rep("b",5),rep("c",2)), variable = rnorm(12)) This code creates data frame: factor variable 1 a -1.55902013 2 a 0.22355431 3 a -1.52195456 4 a -0.32842689 5 a 0.85650212 6 b 0.00962240 7 b -0.06621508 8 b -1.41347823 9 b 0.08969098 10 b 1.31565582 11 c -1.26141417 12 c -0.33364069 And I want to drop factor levels which repeated less than 5 times. I

Get all possible subsets - preserving order

时光总嘲笑我的痴心妄想 提交于 2019-12-04 17:47:39
This is a follow up to this question: Generate all "unique" subsets of a set (not a powerset) My problem is the same, but I think there might be a more optimized solution when order of items in the new subsets and across the subsets needs to be preserved. Example: [1, 2, 3] Would result in: [[1], [2], [3]] [[1, 2], [3]] [[1], [2, 3]] [[1, 2, 3]] Niklas B. I've already answered this question for Python , so I quickly ported my solution over to Ruby: def spannings(lst) return enum_for(:spannings, lst) unless block_given? yield [lst] (1...lst.size).each do |i| spannings(lst[i..-1]) do |rest|

Subset data table without using <-

你离开我真会死。 提交于 2019-12-04 17:19:00
问题 I want to subset some rows of a data table. Like this: # load data data("mtcars") # convert to data table setDT(mtcars,keep.rownames = T) # Subset data mtcars <- mtcars[like(rn,"Mer"),] # or mtcars <- mtcars[mpg > 20,] However, I'm working with a huge data set and I wanted to avoid using <- , which is not memory efficient because it makes a copy of the data. Is this correct? Is it possible to update the filtered data without <- ? 回答1: What you are asking would be delete rows by reference . It

subset an additional variable and append it to the previous one in R

為{幸葍}努か 提交于 2019-12-04 17:10:48
I have a function that subset s what (i.e., a variable) user requests out of this dataset . The function works perfect. But I was wondering if there might be a way that in addition to what user requests, the function always subset entries that contain control == TRUE and append those to what the user has requested. For example, suppose user wants to subset entries with type == 4 . In this dataset , there are 4 such entries. As reproducible code and data below show, this is done easily BUT there also are 4 other entries for which control == TRUE , how can function find and append these 4 other

PySpark: Search For substrings in text and subset dataframe

馋奶兔 提交于 2019-12-04 17:01:29
I am brand new to pyspark and want to translate my existing pandas / python code to PySpark . I want to subset my dataframe so that only rows that contain specific key words I'm looking for in 'original_problem' field is returned. Below is the Python code I tried in PySpark: def pilot_discrep(input_file): df = input_file searchfor = ['cat', 'dog', 'frog', 'fleece'] df = df[df['original_problem'].str.contains('|'.join(searchfor))] return df When I try to run the above, I get the following error: AnalysisException: u"Can't extract value from original_problem#207: need struct type but got string;

Using attributes of `ftable` for extracting data

荒凉一梦 提交于 2019-12-04 16:06:48
问题 I sometimes use the ftable function purely for its presentation of hierarchical categories. However, sometimes, when the table is large, I would like to further subset the table before using it. Let's say we're starting with: mytable <- ftable(Titanic, row.vars = 1:3) mytable ## Survived No Yes ## Class Sex Age ## 1st Male Child 0 5 ## Adult 118 57 ## Female Child 0 1 ## Adult 4 140 ## 2nd Male Child 0 11 ## Adult 154 14 ## Female Child 0 13 ## Adult 13 80 ## 3rd Male Child 35 13 ## Adult 387

Mapping MongoDB documents to case class with types but without embedded documents

别说谁变了你拦得住时间么 提交于 2019-12-04 14:51:27
Subset looks like an interesting, thin MongoDB wrapper. In one of the examples given, there are Tweets and Users. However, User is a subdocument of Tweet . In classical SQL, this would be normalized into two separate tables with a foreign key from Tweet to User. In MongoDB, this wouldn't necessitate a DBRef , storing the user's ObjectId would be sufficient. Both in Subset and Salat this would result in these case classes: case class Tweet(_id: ObjectId, content: String, userId: ObjectId) case class User(_id: ObjectId, name: String) So there's no guarantee that the ObjectId in Tweet actually