subset | 易学教程

optimal way to find sum(S) of all contiguous sub-array's max difference

阅读更多关于 optimal way to find sum(S) of all contiguous sub-array's max difference

You are given an array with n elements: d[0], d[1], ..., d[n-1] . Calculate the sum(S) of all contiguous sub-array's max difference. Formally: S = sum{max{d[l,...,r]} - min{d[l, ..., r}} ,∀ 0 <= l <= r < n Input: 4 1 3 2 4 Output: 12 Explanation: l = 0; r = 0; array: [1] sum = max([1]) - min([1]) = 0 l = 0; r = 1; array: [1,3] sum = max([1,3]) - min([1,3]) = 3 - 1 = 2 l = 0; r = 2; array: [1,3,2] sum = max([1,3,2]) - min([1,3,2]) = 3 - 1 = 2 l = 0;r = 3; array: [1,3,2,4] sum = max([1,3,2,4]) - min([1,3,2,4]) = 4 - 1 = 3 l = 1; r = 1 will result in zero l = 1; r = 2; array: [3,2] sum = max([3,2

Filter by ranges supplied by two vectors, without a join operation

阅读更多关于 Filter by ranges supplied by two vectors, without a join operation

问题 I wish to do exactly this: Take dates from one dataframe and filter data in another dataframe - R except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter. Here is sample data: tmp_df <- data.frame(a = 1:10) I wish to do an operation that looks like this: lower_bound <- c(2, 4) upper_bound <- c(2, 5) tmp_df %>% filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately and my desired

How to subset a list based on the length of its elements in R

阅读更多关于 How to subset a list based on the length of its elements in R

In R I have a function ( coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply. I have a list of IP's called ip.addresses : > head(ip.addresses) [1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12" Note: Those or any other IP's can be used to reproduce this problem. So I apply the function to that object with sapply : ips.info <- sapply(ip.addresses, ip2coordinates) and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a

subset based on frequency level [duplicate]

阅读更多关于 subset based on frequency level [duplicate]

This question already has an answer here: Subset data frame based on number of rows per group 2 answers I want to generate a df that selects rows associated with an "ID" that in turn is associated with a variable called cutoff. For this example, I set the cutoff to 9, meaning that I want to select rows in df1 whose ID value is associated with more than 9 rows. The last line of my code generates a df that I don't understand. The correct df would have 24 rows, all with either a 3 or a 4 in the ID column. Can someone explain what my last line of code is actually doing and suggest a different

Count the total number of subsets that don't have consecutive elements

阅读更多关于 Count the total number of subsets that don't have consecutive elements

I'm trying to solve pretty complex problem with combinatorics and counting subsets. First of all let's say we have given set A = {1, 2, 3, ... N} where N <= 10^(18). Now we want to count subsets that don't have consecutive numbers in their representation. Example Let's say N = 3, and A = {1,2,3}. There are 2^3 total subsets but we don't want to count the subsets (1,2), (2,3) and (1,2,3). So in total for this question we want to answer 5 because we want to count only the remaining 5 subsets. Those subsets are (Empty subset), (1), (2), (3), (1,3). Also we want to print the result modulo 10^9 + 7

r subset array using vector

阅读更多关于 r subset array using vector

I feel like this question should have already been answered, but I found none. I have an array and I want to subset it using a vector. I know how to do it the hard way, but I'm sure there's got to be an easy way. Any ideas? Here's my example: dat <- data.frame(a = rep(letters[1:3], 2), b = rep(letters[1:2], 3), c = c(rep("a", 5), "b"), x = rnorm(6), stringsAsFactors = FALSE) l <- by(dat[ , "x"], dat[ , 1:3], mean) l["a", "a", "a"] # works l[c("a", "a", "a")] # does not work So I guess I need to a way to remove the c() wrapper form c("a", "a", "a") before passing it to l . Instead of a vector

R: Efficiently subsetting dataframe based on time of day

阅读更多关于 R: Efficiently subsetting dataframe based on time of day

I have a large (150,000x7) dataframe that I intend to use for back-testing and real-time analysis of a financial market. The data represents the condition of an investment vehicle at 5 minute intervals ( although holes do exist ). It looks like this (but much longer): pTime Time Price M1 M2 M3 M4 1 1212108300 20:45:00 1.5518 12.21849 -0.37125 4.50549 -31.00559 2 1212108900 20:55:00 1.5516 11.75350 -0.81792 -1.53846 -32.12291 3 1212109200 21:00:00 1.5512 10.75070 -1.47438 -8.24176 -34.35754 4 1212109500 21:05:00 1.5514 10.23529 -1.06044 -8.46154 -33.24022 5 1212109800 21:10:00 1.5514 9.74790 -1

creating a new list with subset of list using index in python

阅读更多关于 creating a new list with subset of list using index in python

问题 A list: a = ['a', 'b', 'c', 3, 4, 'd', 6, 7, 8] I want a list using a subset of a using a[0:2],a[4], a[6:] , that is I want a list ['a', 'b', 4, 6, 7, 8] 回答1: Try new_list = a[0:2] + [a[4]] + a[6:] . Or more generally, something like this: from itertools import chain new_list = list(chain(a[0:2], [a[4]], a[6:])) This works with other sequences as well, and is likely to be faster. Or you could do this: def chain_elements_or_slices(*elements_or_slices): new_list = [] for i in elements_or_slices

Subsetting based on co-occurrence within a time window

阅读更多关于 Subsetting based on co-occurrence within a time window

问题 I am having trouble subsetting data based on different attributes in different columns. Here is a dummy data set with species, area where it was found, and time (already in POSIXct). SP Time Area B 07:22 1 F 09:22 4 A 09:22 1 C 08:17 3 D 09:20 1 E 06:55 4 D 09:03 1 E 09:12 2 F 09:45 1 B 09:15 1 I need to subset the rows that have SP==A, plus all other species occurring in the same area (in this case 1), within a time window of +30 and -30 minutes returning this: SP Time Area A 09:22 1 D 09:20

Loop linear regression and saving ALL coefficients

阅读更多关于 Loop linear regression and saving ALL coefficients

问题 Based on the link below, I created a code to run regression on subsets of my data based on a variable. Loop linear regression and saving coefficients In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets) res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){ fit <- lm(y~x1 + x2, data=x) res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit)) res })) This results in the following dataset DUMMY coeff 0.(Intercept) 0 22.8419956 0.x1 0