subset

optimal way to find sum(S) of all contiguous sub-array's max difference

荒凉一梦 提交于 2019-12-01 01:25:45
You are given an array with n elements: d[0], d[1], ..., d[n-1] . Calculate the sum(S) of all contiguous sub-array's max difference. Formally: S = sum{max{d[l,...,r]} - min{d[l, ..., r}} ,∀ 0 <= l <= r < n Input: 4 1 3 2 4 Output: 12 Explanation: l = 0; r = 0; array: [1] sum = max([1]) - min([1]) = 0 l = 0; r = 1; array: [1,3] sum = max([1,3]) - min([1,3]) = 3 - 1 = 2 l = 0; r = 2; array: [1,3,2] sum = max([1,3,2]) - min([1,3,2]) = 3 - 1 = 2 l = 0;r = 3; array: [1,3,2,4] sum = max([1,3,2,4]) - min([1,3,2,4]) = 4 - 1 = 3 l = 1; r = 1 will result in zero l = 1; r = 2; array: [3,2] sum = max([3,2

Filter by ranges supplied by two vectors, without a join operation

懵懂的女人 提交于 2019-12-01 01:12:25
问题 I wish to do exactly this: Take dates from one dataframe and filter data in another dataframe - R except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter. Here is sample data: tmp_df <- data.frame(a = 1:10) I wish to do an operation that looks like this: lower_bound <- c(2, 4) upper_bound <- c(2, 5) tmp_df %>% filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately and my desired

How to subset a list based on the length of its elements in R

試著忘記壹切 提交于 2019-12-01 00:56:10
In R I have a function ( coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply. I have a list of IP's called ip.addresses : > head(ip.addresses) [1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12" Note: Those or any other IP's can be used to reproduce this problem. So I apply the function to that object with sapply : ips.info <- sapply(ip.addresses, ip2coordinates) and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a

subset based on frequency level [duplicate]

柔情痞子 提交于 2019-12-01 00:21:18
This question already has an answer here: Subset data frame based on number of rows per group 2 answers I want to generate a df that selects rows associated with an "ID" that in turn is associated with a variable called cutoff. For this example, I set the cutoff to 9, meaning that I want to select rows in df1 whose ID value is associated with more than 9 rows. The last line of my code generates a df that I don't understand. The correct df would have 24 rows, all with either a 3 or a 4 in the ID column. Can someone explain what my last line of code is actually doing and suggest a different

Count the total number of subsets that don't have consecutive elements

非 Y 不嫁゛ 提交于 2019-12-01 00:02:52
I'm trying to solve pretty complex problem with combinatorics and counting subsets. First of all let's say we have given set A = {1, 2, 3, ... N} where N <= 10^(18). Now we want to count subsets that don't have consecutive numbers in their representation. Example Let's say N = 3, and A = {1,2,3}. There are 2^3 total subsets but we don't want to count the subsets (1,2), (2,3) and (1,2,3). So in total for this question we want to answer 5 because we want to count only the remaining 5 subsets. Those subsets are (Empty subset), (1), (2), (3), (1,3). Also we want to print the result modulo 10^9 + 7

r subset array using vector

久未见 提交于 2019-11-30 23:10:27
I feel like this question should have already been answered, but I found none. I have an array and I want to subset it using a vector. I know how to do it the hard way, but I'm sure there's got to be an easy way. Any ideas? Here's my example: dat <- data.frame(a = rep(letters[1:3], 2), b = rep(letters[1:2], 3), c = c(rep("a", 5), "b"), x = rnorm(6), stringsAsFactors = FALSE) l <- by(dat[ , "x"], dat[ , 1:3], mean) l["a", "a", "a"] # works l[c("a", "a", "a")] # does not work So I guess I need to a way to remove the c() wrapper form c("a", "a", "a") before passing it to l . Instead of a vector

R: Efficiently subsetting dataframe based on time of day

邮差的信 提交于 2019-11-30 22:31:24
I have a large (150,000x7) dataframe that I intend to use for back-testing and real-time analysis of a financial market. The data represents the condition of an investment vehicle at 5 minute intervals ( although holes do exist ). It looks like this (but much longer): pTime Time Price M1 M2 M3 M4 1 1212108300 20:45:00 1.5518 12.21849 -0.37125 4.50549 -31.00559 2 1212108900 20:55:00 1.5516 11.75350 -0.81792 -1.53846 -32.12291 3 1212109200 21:00:00 1.5512 10.75070 -1.47438 -8.24176 -34.35754 4 1212109500 21:05:00 1.5514 10.23529 -1.06044 -8.46154 -33.24022 5 1212109800 21:10:00 1.5514 9.74790 -1

creating a new list with subset of list using index in python

大城市里の小女人 提交于 2019-11-30 21:58:08
问题 A list: a = ['a', 'b', 'c', 3, 4, 'd', 6, 7, 8] I want a list using a subset of a using a[0:2],a[4], a[6:] , that is I want a list ['a', 'b', 4, 6, 7, 8] 回答1: Try new_list = a[0:2] + [a[4]] + a[6:] . Or more generally, something like this: from itertools import chain new_list = list(chain(a[0:2], [a[4]], a[6:])) This works with other sequences as well, and is likely to be faster. Or you could do this: def chain_elements_or_slices(*elements_or_slices): new_list = [] for i in elements_or_slices

Subsetting based on co-occurrence within a time window

人走茶凉 提交于 2019-11-30 21:20:15
问题 I am having trouble subsetting data based on different attributes in different columns. Here is a dummy data set with species, area where it was found, and time (already in POSIXct). SP Time Area B 07:22 1 F 09:22 4 A 09:22 1 C 08:17 3 D 09:20 1 E 06:55 4 D 09:03 1 E 09:12 2 F 09:45 1 B 09:15 1 I need to subset the rows that have SP==A, plus all other species occurring in the same area (in this case 1), within a time window of +30 and -30 minutes returning this: SP Time Area A 09:22 1 D 09:20

Loop linear regression and saving ALL coefficients

杀马特。学长 韩版系。学妹 提交于 2019-11-30 20:59:09
问题 Based on the link below, I created a code to run regression on subsets of my data based on a variable. Loop linear regression and saving coefficients In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets) res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){ fit <- lm(y~x1 + x2, data=x) res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit)) res })) This results in the following dataset DUMMY coeff 0.(Intercept) 0 22.8419956 0.x1 0