I am working on a simulation system. I will soon have experimental data (histograms) for the real-world distribution of values for several simulation inputs.
Whe
So it seems that what I want in order to generate a given probablity distribution is a Quantile Function, which is the inverse of the cumulative distribution function, as @dmckee says.
The question becomes: What is the best way to generate and store a quantile function describing a given continuous histogram? I have a feeling the answer will depend greatly on the shape of the input - if it follows any kind of pattern there should be simplifications over the most general case. I'll update here as I go.
Edit:
I had a conversation this week that reminded me of this problem. If I forgo describing the histogram as an equation, and just store the table, can I do selections in O(1) time? It turns out you can, without any loss of precision, at the cost of O(N lgN) construction time.
Create an array of N items. A uniform random selection into the array will find an item with probablilty 1/N. For each item, store the fraction of hits for which this item should actually be selected, and the index of another item which will be selected if this one is not.
Weighted Random Sampling, C implementation:
//data structure
typedef struct wrs_data {
double share;
int pair;
int idx;
} wrs_t;
//sort helper
int wrs_sharecmp(const void* a, const void* b) {
double delta = ((wrs_t*)a)->share - ((wrs_t*)b)->share;
return (delta<0) ? -1 : (delta>0);
}
//Initialize the data structure
wrs_t* wrs_create(int* weights, size_t N) {
wrs_t* data = malloc(sizeof(wrs_t));
double sum = 0;
int i;
for (i=0;i0 && i= 0) { check=j--;}
}
}
return data;
}
int wrs_pick(wrs_t* collection, size_t N)
//O(1) weighted random sampling (after preparing the collection).
//Randomly select a bucket, and a percentage.
//If the percentage is greater than that bucket's share of hits,
// use it's paired bucket.
{
int idx = rand_in_range(0,N);
double pct = rand_percent();
if (pct > collection[idx].share) { idx = collection[idx].pair; }
return collection[idx].idx;
}
Edit 2: After a little research, I found it's even possible to do the construction in O(N) time. With careful tracking, you don't need to sort the array to find the large and small bins. Updated implementation here