subset

MongoDB LinQ “Select” method will really retrieve only a subset of fields?

梦想的初衷 提交于 2019-12-03 21:45:32
问题 Searching across the internet how to retrieve a subset of fields in MongoDB, using C# official driver (but using LinQ as the base architecture) I found how to do this in MongoDB shell. // selecting only "field" of a collection db.collection.find( { field : 'value' }, { field: 1 } ); Then, I found at C# LinQ Tutorial the Select method, which is equivalent to this: collection.AsQueryable<T>().Select(x => new { x.field }); However, the tutorial says the method " is used to project a new result

Find all unique subsets of a set of values

浪子不回头ぞ 提交于 2019-12-03 21:38:54
I have an algorithm problem. I am trying to find all unique subset of values from a larger set of values. For example say I have the set {1,3,7,9} . What algorithm can I use to find these subsets of 3? {1,3,7} {1,3,9} {1,7,9} {3,7,9} Subsets should not repeat, and order is unimportant, set {1,2,3} is the same as set {3,2,1} for these purposes. Psudocode (or the regular kind) is encouraged. A brute force approach is obviously possible, but not desired. For example such a brute force method would be as follows. for i = 0 to size for j = i + 1 to size for k = j + 1 to size subset[] = {set[i],set

Remove the rows of data frame whose cells match a given vector

 ̄綄美尐妖づ 提交于 2019-12-03 20:46:30
I have big data frame with various numbers of columns and rows. I would to search the data frame for values of a given vector and remove the rows of the cells that match the values of this given vector. I'd like to have this as a function because I have to run it on multiple data frames of variable rows and columns and I wouls like to avoid for loops. for example ff<-structure(list(j.1 = 1:13, j.2 = 2:14, j.3 = 3:15), .Names = c("j.1","j.2", "j.3"), row.names = c(NA, -13L), class = "data.frame") remove all rows that have cells that contain the values 8,9,10 I guess i could use ff[ !ff[,1] %in%

Find sum of subset with multiplication

佐手、 提交于 2019-12-03 20:07:29
Let's say we have got a set {a_1, a_2, a_3, ..., a_n} The goal is to find a sum that we create in the following way: We find all subsets whose length is 3, then multiply each subset's elements (for the subset {b_1, b_2, b_3} the result will be b_1*b_2*b_3 ). At the end we sum up all these products. I am looking for a shortest time-execution algorithm. Example SET: {3, 2, 1, 2} Let S be our sum. S = 3*2*1 + 3*2*2 + 2*1*2 + 3*1*2 = 28 It is easier to calculate sum of multiplied triplets when repetitions are allowed (like a_1*a_1*a_1). This sum is just (sum^3) . Since repetitions are not allowed,

Filtering rows in a dataset by columns

可紊 提交于 2019-12-03 19:15:06
问题 I have the following table: FN LN LN1 LN2 LN3 LN4 LN5 a b b x x x x a c b d e NA NA a d c a b x x a e b c d x e I'm filtering records for which LN is present in LN1 to LN5. The code I used: testFilter = filter(test, LN %in% c(LN1, LN2, LN3, LN4, LN5)) The result is not what I expect: ï..FN LN LN1 LN2 LN3 LN4 LN5 1 a b b x x x x 2 a c b d e <NA> <NA> 3 a d c a b x x 4 a e b c d x e I understand that c(LN1, LN2, LN3, LN4, LN5) gives: "b" "b" "c" "b" "x" "d" "a" "c" "x" "e" "b" "d" "x" NA "x" "x

Why does subsetting a column from a data frame vs. a tibble give different results

半世苍凉 提交于 2019-12-03 18:09:54
问题 This is a 'why' question and not a 'How to' question. I have a tibble as a result of an aggregation dplyr > str(urls) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 144 obs. of 4 variables: $ BRAND : chr "Bobbi Brown" "Calvin Klein" "Chanel" "Clarins" ... $ WEBSITE : chr "http://www.bobbibrowncosmetics.com/" "http://www.calvinklein.com/shop/en/ck" "http://www.chanel.com/en_US/" "http://www.clarinsusa.com/" ... $ domain : chr "bobbibrowncosmetics.com/" "calvinklein.com/shop/en/ck" "chanel.com/en_US

Subsetting in a second level R function

孤街浪徒 提交于 2019-12-03 16:25:54
Function foo1 can subset a list by a requested variable (e.g., by = type == 1 ). Otherwise, foo1 will simply output the inputted list itself. For my purposes, I need to use foo1 within a new function called foo2 . In my code below, my desired output is obtained like so: foo2(data = D, by = G[[1]]) ; foo2(data = D, by = G[[2]]) ; foo2(data = D, by = G[[3]]) . But, I wonder why when I loop over G using lapply , I get an error as shown below ? foo1 <- function(data, by){ L <- split(data, data$study.name) ; L[[1]] <- NULL if(!missing(by)){ L <- lapply(L, function(x) do.call("subset", list(x, by)))

Randomly sample a percentage of rows within a data frame

烈酒焚心 提交于 2019-12-03 15:41:47
问题 Related to this question. gender <- c("F", "M", "M", "F", "F", "M", "F", "F") age <- c(23, 25, 27, 29, 31, 33, 35, 37) mydf <- data.frame(gender, age) mydf[ sample( which(mydf$gender=='F'), 3 ), ] Instead of selecting a number of rows (3 in above case), how can I randomly select 20% of rows with "F"? So of the five rows with "F", how do I randomly sample 20% of those rows. 回答1: How about this: mydf[ sample( which(mydf$gender=='F'), round(0.2*length(which(mydf$gender=='F')))), ] Where 0.2 is

mongodb query subset of an array

时光毁灭记忆、已成空白 提交于 2019-12-03 14:29:48
问题 I have a field _keywords which is an array of strings. I want to get documents of which _keywords are super-set of the query array. For example: db.article.insert({'_keywords': ['foo', 'foo1', 'foo2']}) I want to retrive this record when I query subset of ['foo', 'foo1', 'foo2'], eg: ['foo'], ['foo1', 'foo2'] EDIT: something like: db.article.find({'_keywords': {$contains: array}}) 回答1: Use the $all operator: db.article.find( { _keywords: { $all: [ 'foo1', 'foo2' ] } } ); Source: http://www

Filtering data in a dataframe based on criteria

╄→гoц情女王★ 提交于 2019-12-03 13:47:57
I am new to R and can't get to grips with this concept. Suppose I have a table loaded called "places" with 3 say columns - city, population and average summer temperature Say I want to "filter" - produce a new table object where population is less than 1 million and average summer temperature is greater than 70 degrees. In any other program I have used this would be pretty easy but having done some research I'm working myself up into greater confusion. Given the purpose of R and what it does this must be pretty simple stuff. How would I apply the above conditions to the table? What would the