data-partitioning

Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation

非 Y 不嫁゛ 提交于 2020-01-04 14:15:07
问题 I want to create jack-knife data partitions for the data frame below, with the partitions to be used in caret::train (like the caret::groupKFold() produces). However, the catch is that I want to restrict the test points to say greater than 16 days, whilst using the remainder of these data as the training set. df <- data.frame(Effect = seq(from = 0.05, to = 1, by = 0.05), Time = seq(1:20)) The reason I want to do this is that I am only really interested in how well the model is predicting the

Generating unique sorted partitions in Ruby

只谈情不闲聊 提交于 2020-01-01 06:14:41
问题 I'm trying to generate the set of sequences as shown below, not in any particularly order, but here its shown as a descending sequence. Note that each sequence also descends as I'm interested in combinations, not permutations. I'd like to store each sequence as an array..or the set of sequences as an array of arrays more preferably, but first things first. 6 5 1 4 2 4 1 1 3 3 3 2 1 3 1 1 1 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 Right now I am simply focusing on generating these sets and I'm

U-SQL Split a CSV file to multiple files based on Distinct values in file

不打扰是莪最后的温柔 提交于 2019-12-30 10:35:27
问题 I have the Data in Azure Data Lake Store and I am processing the data present there with Azure Data Analytic Job with U-SQL. I have several CSV files which contain spatial data, similar to this: File_20170301.csv longtitude| lattitude | date | hour | value1 ----------+-----------+--------------+------+------- 45.121 | 21.123 | 2017-03-01 | 01 | 20 45.121 | 21.123 | 2017-03-01 | 02 | 10 45.121 | 21.123 | 2017-03-01 | 03 | 50 48.121 | 35.123 | 2017-03-01 | 01 | 60 48.121 | 35.123 | 2017-03-01 |

How does createDataPartition function from caret package split data?

拟墨画扇 提交于 2019-12-30 09:00:38
问题 From the documentation: For bootstrap samples, simple random sampling is used. For other data splitting, the random sampling is done within the levels of y when y is a factor in an attempt to balance the class distributions within the splits. For numeric y, the sample is split into groups sections based on percentiles and sampling is done within these subgroups. For createDataPartition, the number of percentiles is set via the groups argument. I don't understand why this "balance" thing is

Enumerate all k-partitions of 1d array with N elements?

▼魔方 西西 提交于 2019-12-30 08:52:36
问题 This seems like a simple request, but google is not my friend because "partition" scores a bunch of hits in database and filesystem space. I need to enumerate all partitions of an array of N values (N is constant) into k sub-arrays. The sub-arrays are just that - a starting index and ending index. The overall order of the original array will be preserved. For example, with N=4 and k=2: [ | a b c d ] (0, 4) [ a | b c d ] (1, 3) [ a b | c d ] (2, 2) [ a b c | d ] (3, 1) [ a b c d | ] (4, 0) And

Find next record where status field is different from current

前提是你 提交于 2019-12-24 11:25:21
问题 I have a table that is used to log events. Two types specifically : ON and OFF. There are sometimes overlapping log entries as there can be 2 simultaneous devices logging. This is not crucial, as the end report should give a [mostly] correct overview of ON -> OFF periods. Below is a sample, with the 3rd column just for illustration: It does not exist. ActionTaken ID ID_of_next_OFF Switched ON 1 3 Switched ON 2 6 Switched OFF 3 Switched ON 4 7 Switched ON 5 8 Switched OFF 6 Switched OFF 7

In-place partition when the array may or may not contain the pivot element

天涯浪子 提交于 2019-12-24 03:14:45
问题 Is there an in-place partitioning algorithm (of the kind used in a Quicksort implementation) that does not rely on the pivot element being present in the array? In other words, the array elements must be arranged in this order: Elements less than the pivot (if any) Elements equal to the pivot (if any) Elements greater than the pivot (if any) It must still return the index (after sorting) of the pivot element if it happens to be present in the array, or a special value if not; This could be

SQL to check when pairs don't match

回眸只為那壹抹淺笑 提交于 2019-12-23 01:47:05
问题 I am using SQL Server 2012 I have the following sample data Date Type Symbol Price 6/30/1995 gaus 313586U72 109.25 6/30/1995 gbus 313586U72 108.94 6/30/1995 csus NES 34.5 6/30/1995 lcus NES 34.5 6/30/1995 lcus NYN 40.25 6/30/1995 uaus NYN 40.25 6/30/1995 agus SRR 10.25 6/30/1995 lcus SRR 0.45 7/1/1995 gaus 313586U72 109.25 7/1/1995 gbus 313586U72 108.94 I want to filter out when symbol and price match. It's ok if type doesn't match. Thus with the above data I would expect to only see Date

Need algorithm for fast storage and retrieval (search) of sets and subsets

自古美人都是妖i 提交于 2019-12-20 14:22:27
问题 I need a way of storing sets of arbitrary size for fast query later on. I'll be needing to query the resulting data structure for subsets or sets that are already stored. === Later edit: To clarify, an accepted answer to this question would be a link to a study that proposes a solution to this problem. I'm not expecting for people to develop the algorithm themselves. I've been looking over the tuple clustering algorithm found here, but it's not exactly what I want since from what I understand

How to sort an integer array into negative, zero, positive part without changing relative position?

允我心安 提交于 2019-12-20 10:36:22
问题 Give an O(n) algorithm which takes as input an array S, then divides S into three sets: negatives, zeros, and positives. Show how to implement this in place, that is, without allocating new memory. And you have to keep the number's relative sequence. for example: {-1, 4, 0, -2, 1, 2} ==> {-1, -2, 0, 4, 1, 2} I am not sure whether or not such an solution exits. The best solutions I can think out are: Solution 1: Using an extra integer array, then traverse the whole array to get negatives, then