sampling | 易学教程

Random sampling from a dataset, while preserving original probability distribution

阅读更多关于 Random sampling from a dataset, while preserving original probability distribution

问题 I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close

Logarithmic sampling

阅读更多关于 Logarithmic sampling

I am working with values between [minValue,maxValue] and I want to create a vector of values in between this range. But I want more values near to the minValue. Example: min = 1 max = 100 vector = [1,1.1,1.5,2,3,5,10,15,30,50,100]; Something like that. The goal is to be more accurate around the minimum. Is that possible to implement that? You can start with by generating numbers from 0 to 1 with constant step (for example 0.1). Then power them with some exponent - the bigger exponent, the sharper curve. Then shift and multiply to get into your desired min-max range. Pseudocode: min = 1.0 max =

Tensorflow: Efficient multinomial sampling (Theano x50 faster?)

阅读更多关于 Tensorflow: Efficient multinomial sampling (Theano x50 faster?)

I want to be able to sample from a multinomial distribution very efficiently and apparently my TensorFlow code is very... very slow... The idea is that, I have: A vector: counts = [40, 50, 26, ..., 19] for example A matrix of probabilities: probs = [[0.1, ..., 0.5], ... [0.3, ..., 0.02]] such that np.sum(probs, axis=1) = 1 Let's say len(counts) = N and len(probs) = (N, 50) . What I want to do is (in our example): sample 40 times from the first probability vector of the matrix probs sample 50 times from the second probability vector of the matrix probs ... sample 19 times from the Nth

What does it mean to put an `rnorm` as an argument of another `rnorm` in R?

阅读更多关于 What does it mean to put an `rnorm` as an argument of another `rnorm` in R?

问题 I have difficulty understanding what it means when an rnorm is used as one of the arguments of another rnorm ? (I'll explain more below) For example, below, in the first line of my R code I use an rnorm() and I call this rnorm() : mu . mu consists of 10,000 x . Now, let me put mu itself as the mean argument of a new rnorm() called "distribution". My question is how mu which itself has 10,000 x be used as the mean argument of this new rnorm() called distribution? P.S.: mean argument of any

What is the correct method to upsample?

阅读更多关于 What is the correct method to upsample?

I have an array of samples at 75 Hz, and I want to store them at 128 Hz. If it was 64 Hz and 128 Hz it was very simple, I would just double all samples. But what is the correct way if the samplerates are not a fraction of eachother? When you want to avoid Filtering then you can: handle signal as set of joined interpolation cubics curves but this point is the same as if you use linear interpolation. Without knowing something more about your signal and purpose you can not construct valid coefficients (without damaging signal accuracy) for example of how to construct such cubic look here: my

PCM algorithm for upsampling

阅读更多关于 PCM algorithm for upsampling

问题 I have 8k16bit pcm audio and I want to upsample it to 16k16bit. I have to do this manually. Can someone tell me the algorithm for linear interpolation? Should I interpolate between each two bytes? Also when I upsample i have to make changes for the wav header - what should I change? 回答1: As others have mentioned, linear interpolation doesn't give the best sound quality, but it's simple and cheap. For each new sample you create, just average it with the next one, e.g. short[] source = ...;

Selecting nodes with probability proportional to trust

阅读更多关于 Selecting nodes with probability proportional to trust

Does anyone know of an algorithm or data structure relating to selecting items, with a probability of them being selected proportional to some attached value? In other words: http://en.wikipedia.org/wiki/Sampling_%28statistics%29#Probability_proportional_to_size_sampling The context here is a decentralized reputation system and the attached value is therefore the value of trust one user has in another. In this system all nodes either start as friends which are completely trusted or unknowns which are completely untrusted. This isn't useful by itself in a large P2P network because there will be

Shuffling a vector - all possible outcomes of sample()?

阅读更多关于 Shuffling a vector - all possible outcomes of sample()?

I have a vector with five items. my_vec <- c("a","b","a","c","d") If I want to re-arrange those values into a new vector (shuffle), I could use sample(): shuffled_vec <- sample(my_vec) Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this? Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled

Latin hypercube sampling with python

阅读更多关于 Latin hypercube sampling with python

I would like to sample a distribution defined by a function in multiple dimensions (2,3,4): f(x, y, ...) = ... The distributions might be ugly, non standard (like a 3D spline on data, sum of gaussians ect.). To this end I would like to uniformly sample the 2..4 dimensional space, and than with an additional random number accept or reject the given point of the space into my sample. Is there a ready to use python lib for this purpose? Is there python lib for generating the points in this 2..4 dimensional space with latin hypercube sampling, or with other uniform sampling method? Bruteforce

Random sampling from a dataset, while preserving original probability distribution

阅读更多关于 Random sampling from a dataset, while preserving original probability distribution

I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close to the original distribution. As my dataset is a long-tail probability distribution , the amount of