sampling | 易学教程

How to perform undersampling (the right way) with python scikit-learn?

阅读更多关于 How to perform undersampling (the right way) with python scikit-learn?

问题 I am attempting to perform undersampling of the majority class using python scikit learn. Currently my codes look for the N of the minority class and then try to undersample the exact same N from the majority class. And both the test and training data have this 1:1 distribution as a result. But what I really want is to do this 1:1 distribution on the training data ONLY but test it on the original distribution in the testing data. I am not quite sure how to do the latter as there is some dict

c++ discrete distribution sampling with frequently changing probabilities

阅读更多关于 c++ discrete distribution sampling with frequently changing probabilities

问题 Problem: I need to sample from a discrete distribution constructed of certain weights e.g. {w1,w2,w3,..}, and thus probability distribution {p1,p2,p3,...}, where pi=wi/(w1+w2+...). some of wi's change very frequently, but only a very low proportion of all wi's. But the distribution itself thus has to be renormalised every time it happens, and therefore I believe Alias method does not work efficiently because one would need to build the whole distribution from scratch every time. The method I

How to get a random (bootstrap) sample from pandas multiindex

阅读更多关于 How to get a random (bootstrap) sample from pandas multiindex

问题 I'm trying to create a bootstrapped sample from a multiindex dataframe in Pandas. Below is some code to generate the kind of data I need. from itertools import product import pandas as pd import numpy as np df = pd.DataFrame({'group1': [1, 1, 1, 2, 2, 3], 'group2': [13, 18, 20, 77, 109, 123], 'value1': [1.1, 2, 3, 4, 5, 6], 'value2': [7.1, 8, 9, 10, 11, 12] }) df = df.set_index(['group1', 'group2']) print df The df dataframe looks like: value1 value2 group1 group2 1 13 1.1 7.1 18 2.0 8.0 20 3

Android accelerometer sampling rate/delay stabilization

阅读更多关于 Android accelerometer sampling rate/delay stabilization

问题 I'm trying to detect the force strength of a tap by using the data from the accelerometer and with the method onTouch. As far as I know, the fastest sampling frequency for the accelerometer is 200-202Hz, but this variability is giving me problems when trying to match the timestamps for the onTouch event and the peak in the accelerometer data. Is there a way to stabilize the readings of the accelerometer to avoid this problem? Like controlling the specific thread or something? 回答1: If you want

Selecting nodes with probability proportional to trust

阅读更多关于 Selecting nodes with probability proportional to trust

问题 Does anyone know of an algorithm or data structure relating to selecting items, with a probability of them being selected proportional to some attached value? In other words: http://en.wikipedia.org/wiki/Sampling_%28statistics%29#Probability_proportional_to_size_sampling The context here is a decentralized reputation system and the attached value is therefore the value of trust one user has in another. In this system all nodes either start as friends which are completely trusted or unknowns

Reproducible splitting of data into training and testing in R

阅读更多关于 Reproducible splitting of data into training and testing in R

问题 A common way for sampling/splitting data in R is using sample , e.g., on row numbers. For example: require(data.table) set.seed(1) population <- as.character(1e5:(1e6-1)) # some made up ID names N <- 1e4 # sample size sample1 <- data.table(id = sort(sample(population, N))) # randomly sample N ids test <- sample(N-1, N/2, replace = F) test1 <- sample1[test, .(id)] The problem is that this isn't very robust to changes in the data. For example if we drop just one observation: sample2 <- sample1[

Efficiently picking a random element from a chained hash table?

阅读更多关于 Efficiently picking a random element from a chained hash table?

问题 Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6): Suppose we have stored n keys in a hash table of size m, with collisions resolved by chaining, and that we know the length of each chain, including the length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L * (1 + m/n)). What I thought so far is that the

Fast Poisson Disk Sampling [Robert Bridson] in Python

阅读更多关于 Fast Poisson Disk Sampling [Robert Bridson] in Python

问题 First of all, I implemented the ordinary, slow, Poisson Disk Sampling algorithm in the 2D plane and it works just fine. This slow version calculates the distances between all points and checks that the point you wish to place is at least R away from all the others. The fast version by Robert Bridson, available here: https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, suggests discretizing your 2D plane into quadratic cells with length = R/sqrt(2) since each cell can at

Efficient algorithm for generating unique (non-repeating) random numbers

阅读更多关于 Efficient algorithm for generating unique (non-repeating) random numbers

问题 I want to solve the following problem. I have to sample among an extremely large set, of the order of 10^20 and extracting a sample without repetitions of size about 10%-20%. Given the size of the set, I believe that an algorithm like Fisher–Yates is not feasible. I'm thinking that something like random path tree might work for doing it in O(n log n) and can't be done faster, but I want to ask if something like this has already been implemented. Thank you for your time! 回答1: I don't know how

How to get sound data sample value in c#

阅读更多关于 How to get sound data sample value in c#

问题 I need to get the sample values of sound data of a WAV file so that by using those sample values i need to get the amplitude values of that sound data in every second. Important: Is there any way to get audio data sample values using Naudio library or wmp library? I am getting the sample values in this way: byte[] data = File.ReadAllBytes(File_textBox.Text); var samples=new int[data.Length]; int x = 0; for (int i = 44; i <data.Length; i += 2) { samples[x] = BitConverter.ToInt16(data, i); x++;