Searching for a fast/efficient histogram algorithm (with pre-specified bins)

前端 未结 3 1202
难免孤独
难免孤独 2020-12-24 07:11

I don\'t do much coding outside of Matlab, but I have a need to export my Matlab code to another language, most likely C. My Matlab code includes a histogram function, histc

3条回答
  •  一整个雨季
    2020-12-24 07:40

    The "ideal" histogram algorithm will depend upon the range you expect to capture. Generally any histogram algorithm will look like this:

    const int NSAMPLES = whatever;
    double samples[NSAMPLES] = { 1.0, 3.93, 1e30, ... }; // your data set
    const int NBUCKETS = 10; // or whatever
    int counts[NBUCKETS] = { 0 };
    for (int i = 0; i != NSAMPLES; ++i) {
        counts[TRANSFER(samples[i])]++;
    }
    

    where TRANSFER() is some function that maps your inputs to a bin (0th or Nth bin mapping to "out of range" of applicable).

    The exact implementation of TRANSFER() depends a lot on the expected distribution of your sample and where you are interested in detail. Some common approaches I have seen:

    • uniform distribution in range [a,b] (requires linear transform)
    • logarithmic distribution of unsigned integer values (best when combined with some bit twiddling hacks to quickly determine the nearest power-of-two or similar).

    If you don't know the distribution up-front, then you really can't have an efficient mechanism to bin them effectively: you'll either have to guess (biased or uninformative results) or store everything and sort it at the end, binning into equal-sized buckets (poor performance).

提交回复
热议问题