sampling

Random sampling from a dataset, while preserving original probability distribution

丶灬走出姿态 提交于 2019-12-07 03:30:53
问题 I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close

Logarithmic sampling

白昼怎懂夜的黑 提交于 2019-12-06 17:14:39
I am working with values between [minValue,maxValue] and I want to create a vector of values in between this range. But I want more values near to the minValue. Example: min = 1 max = 100 vector = [1,1.1,1.5,2,3,5,10,15,30,50,100]; Something like that. The goal is to be more accurate around the minimum. Is that possible to implement that? You can start with by generating numbers from 0 to 1 with constant step (for example 0.1). Then power them with some exponent - the bigger exponent, the sharper curve. Then shift and multiply to get into your desired min-max range. Pseudocode: min = 1.0 max =

Tensorflow: Efficient multinomial sampling (Theano x50 faster?)

佐手、 提交于 2019-12-06 13:26:13
I want to be able to sample from a multinomial distribution very efficiently and apparently my TensorFlow code is very... very slow... The idea is that, I have: A vector: counts = [40, 50, 26, ..., 19] for example A matrix of probabilities: probs = [[0.1, ..., 0.5], ... [0.3, ..., 0.02]] such that np.sum(probs, axis=1) = 1 Let's say len(counts) = N and len(probs) = (N, 50) . What I want to do is (in our example): sample 40 times from the first probability vector of the matrix probs sample 50 times from the second probability vector of the matrix probs ... sample 19 times from the Nth

What does it mean to put an `rnorm` as an argument of another `rnorm` in R?

守給你的承諾、 提交于 2019-12-06 11:49:41
问题 I have difficulty understanding what it means when an rnorm is used as one of the arguments of another rnorm ? (I'll explain more below) For example, below, in the first line of my R code I use an rnorm() and I call this rnorm() : mu . mu consists of 10,000 x . Now, let me put mu itself as the mean argument of a new rnorm() called "distribution". My question is how mu which itself has 10,000 x be used as the mean argument of this new rnorm() called distribution? P.S.: mean argument of any

What is the correct method to upsample?

我们两清 提交于 2019-12-06 10:44:06
I have an array of samples at 75 Hz, and I want to store them at 128 Hz. If it was 64 Hz and 128 Hz it was very simple, I would just double all samples. But what is the correct way if the samplerates are not a fraction of eachother? When you want to avoid Filtering then you can: handle signal as set of joined interpolation cubics curves but this point is the same as if you use linear interpolation. Without knowing something more about your signal and purpose you can not construct valid coefficients (without damaging signal accuracy) for example of how to construct such cubic look here: my

PCM algorithm for upsampling

杀马特。学长 韩版系。学妹 提交于 2019-12-06 09:25:12
问题 I have 8k16bit pcm audio and I want to upsample it to 16k16bit. I have to do this manually. Can someone tell me the algorithm for linear interpolation? Should I interpolate between each two bytes? Also when I upsample i have to make changes for the wav header - what should I change? 回答1: As others have mentioned, linear interpolation doesn't give the best sound quality, but it's simple and cheap. For each new sample you create, just average it with the next one, e.g. short[] source = ...;

Selecting nodes with probability proportional to trust

本小妞迷上赌 提交于 2019-12-06 04:30:49
Does anyone know of an algorithm or data structure relating to selecting items, with a probability of them being selected proportional to some attached value? In other words: http://en.wikipedia.org/wiki/Sampling_%28statistics%29#Probability_proportional_to_size_sampling The context here is a decentralized reputation system and the attached value is therefore the value of trust one user has in another. In this system all nodes either start as friends which are completely trusted or unknowns which are completely untrusted. This isn't useful by itself in a large P2P network because there will be

Shuffling a vector - all possible outcomes of sample()?

时光总嘲笑我的痴心妄想 提交于 2019-12-06 01:02:01
I have a vector with five items. my_vec <- c("a","b","a","c","d") If I want to re-arrange those values into a new vector (shuffle), I could use sample(): shuffled_vec <- sample(my_vec) Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this? Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled

Latin hypercube sampling with python

自闭症网瘾萝莉.ら 提交于 2019-12-05 07:56:53
I would like to sample a distribution defined by a function in multiple dimensions (2,3,4): f(x, y, ...) = ... The distributions might be ugly, non standard (like a 3D spline on data, sum of gaussians ect.). To this end I would like to uniformly sample the 2..4 dimensional space, and than with an additional random number accept or reject the given point of the space into my sample. Is there a ready to use python lib for this purpose? Is there python lib for generating the points in this 2..4 dimensional space with latin hypercube sampling, or with other uniform sampling method? Bruteforce

Random sampling from a dataset, while preserving original probability distribution

不想你离开。 提交于 2019-12-05 05:56:11
I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close to the original distribution. As my dataset is a long-tail probability distribution , the amount of