How do I generate points that match a histogram?

前端 未结 6 1508
迷失自我
迷失自我 2020-12-14 04:10

I am working on a simulation system. I will soon have experimental data (histograms) for the real-world distribution of values for several simulation inputs.

Whe

6条回答
  •  再見小時候
    2020-12-14 04:56

    So it seems that what I want in order to generate a given probablity distribution is a Quantile Function, which is the inverse of the cumulative distribution function, as @dmckee says.

    The question becomes: What is the best way to generate and store a quantile function describing a given continuous histogram? I have a feeling the answer will depend greatly on the shape of the input - if it follows any kind of pattern there should be simplifications over the most general case. I'll update here as I go.


    Edit:

    I had a conversation this week that reminded me of this problem. If I forgo describing the histogram as an equation, and just store the table, can I do selections in O(1) time? It turns out you can, without any loss of precision, at the cost of O(N lgN) construction time.

    Create an array of N items. A uniform random selection into the array will find an item with probablilty 1/N. For each item, store the fraction of hits for which this item should actually be selected, and the index of another item which will be selected if this one is not.

    Weighted Random Sampling, C implementation:

    //data structure
    typedef struct wrs_data {
      double share; 
      int pair;
      int idx;
    } wrs_t;
    
    
    //sort helper
    int wrs_sharecmp(const void* a, const void* b) {
      double delta = ((wrs_t*)a)->share - ((wrs_t*)b)->share;
      return (delta<0) ? -1 : (delta>0);
    }
    
    
    //Initialize the data structure
    wrs_t* wrs_create(int* weights, size_t N) {
      wrs_t* data = malloc(sizeof(wrs_t));
      double sum = 0;
      int i;
      for (i=0;i0 && i= 0) { check=j--;} 
        }
      }
      return data;
    }
    
    
    int wrs_pick(wrs_t* collection, size_t N)
    //O(1) weighted random sampling (after preparing the collection).
    //Randomly select a bucket, and a percentage.
    //If the percentage is greater than that bucket's share of hits, 
    // use it's paired bucket.
    {
      int idx = rand_in_range(0,N);
      double pct = rand_percent();
      if (pct > collection[idx].share) { idx = collection[idx].pair; }
      return collection[idx].idx;
    } 
    

    Edit 2: After a little research, I found it's even possible to do the construction in O(N) time. With careful tracking, you don't need to sort the array to find the large and small bins. Updated implementation here

提交回复
热议问题