Distributions and internal state

后端 未结 1 984
说谎
说谎 2020-12-28 17:05

On Stackoverflow there are many questions about generating uniformly distributed integers from a-priory unknown ranges. E.g.

  • C++11 Generating random numbers fr
1条回答
  •  悲哀的现实
    2020-12-28 17:30

    Interesting question.

    So I was wondering if interfering with how the distribution works by constantly resetting it (i.e. recreating the distribution at every call of get_int_from_range) I get properly distributed results.

    I've written code to test this with uniform_int_distribution and poisson_distribution. It's easy enough to extend this to test another distribution if you wish. The answer seems to be yes.

    Boiler-plate code:

    #include 
    #include 
    #include 
    #include 
    
    typedef std::mt19937_64 engine_type;
    
    inline size_t get_seed()
        { return std::chrono::system_clock::now().time_since_epoch().count(); }
    
    engine_type& engine_singleton()
    {  
        static std::unique_ptr ptr;
    
        if ( !ptr ) 
            ptr.reset( new engine_type(get_seed()) );
        return *ptr;
    }
    
    // ------------------------------------------------------------------------
    
    #include 
    #include 
    #include 
    #include 
    #include 
    
    void plot_distribution( const std::vector& D, size_t mass = 200 )
    {
        const size_t n = D.size();
        for ( size_t i = 0; i < n; ++i ) 
        {
            printf("%02ld: %s\n", i, 
                std::string(static_cast(D[i]*mass),'*').c_str() );
        }
    }
    
    double maximum_difference( const std::vector& x, const std::vector& y )
    {
        const size_t n = x.size(); 
    
        double m = 0.0;
        for ( size_t i = 0; i < n; ++i )
            m = std::max( m, std::abs(x[i]-y[i]) );
    
        return m;
    }
    

    Code for the actual tests:

    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    void compare_uniform_distributions( int lo, int hi )
    {
        const size_t sample_size = 1e5;
    
        // Initialize histograms
        std::vector H1( hi-lo+1, 0.0 ), H2( hi-lo+1, 0.0 );
    
        // Initialize distribution
        auto U = std::uniform_int_distribution(lo,hi);
    
        // Count!
        for ( size_t i = 0; i < sample_size; ++i )
        {
            engine_type E(get_seed());
    
            H1[ U(engine_singleton())-lo ] += 1.0;
            H2[ U(E)-lo ] += 1.0;
        }
    
        // Normalize histograms to obtain "densities"
        for ( size_t i = 0; i < H1.size(); ++i )
        {
            H1[i] /= sample_size; 
            H2[i] /= sample_size; 
        }
    
        printf("Engine singleton:\n"); plot_distribution(H1);
        printf("Engine creation :\n"); plot_distribution(H2);
        printf("Maximum difference: %.3f\n", maximum_difference(H1,H2) );
        std::cout<< std::string(50,'-') << std::endl << std::endl;
    }
    
    void compare_poisson_distributions( double mean )
    {
        const size_t sample_size = 1e5;
        const size_t nbins = static_cast(std::ceil(2*mean));
    
        // Initialize histograms
        std::vector H1( nbins, 0.0 ), H2( nbins, 0.0 );
    
        // Initialize distribution
        auto U = std::poisson_distribution(mean);
    
        // Count!
        for ( size_t i = 0; i < sample_size; ++i )
        {
            engine_type E(get_seed());
            int u1 = U(engine_singleton());
            int u2 = U(E);
    
            if (u1 < nbins) H1[u1] += 1.0;
            if (u2 < nbins) H2[u2] += 1.0;
        }
    
        // Normalize histograms to obtain "densities"
        for ( size_t i = 0; i < H1.size(); ++i )
        {
            H1[i] /= sample_size; 
            H2[i] /= sample_size; 
        }
    
        printf("Engine singleton:\n"); plot_distribution(H1);
        printf("Engine creation :\n"); plot_distribution(H2);
        printf("Maximum difference: %.3f\n", maximum_difference(H1,H2) );
        std::cout<< std::string(50,'-') << std::endl << std::endl;
    
    }
    
    // ------------------------------------------------------------------------
    
    int main()
    {
        compare_uniform_distributions( 0, 25 );
        compare_poisson_distributions( 12 );
    }
    

    Run it here.


    Does the C++ standard make any guarantee regarding this topic?

    Not that I know of. However, I would say that the standard makes an implicit recommendation not to re-create the engine every time; for any distribution Distrib, the prototype of Distrib::operator() takes a reference URNG& and not a const reference. This is understandably required because the engine might need to update its internal state, but it also implies that code looking like this

    auto U = std::uniform_int_distribution(0,10);
    for (  ) U(engine_type());
    

    does not compile, which to me is a clear incentive not to write code like this.


    I'm sure there are plenty of advice out there on how to properly use the random library. It does get complicated if you have to handle the possibility of using random_devices and allowing deterministic seeding for testing purposes, but I thought it might be useful to throw my own recommendation out there too:

    #include 
    #include 
    #include 
    #include 
    
    inline size_t get_seed()
        { return std::chrono::system_clock::now().time_since_epoch().count(); }
    
    template 
    using generator_type = std::function< typename Distrib::result_type () >;
    
    template 
    inline generator_type get_generator( Args&&... args )
    { 
        return std::bind( Distrib( std::forward(args)... ), Engine(get_seed()) ); 
    }
    
    // ------------------------------------------------------------------------
    
    #include 
    
    int main()
    {
        auto U = get_generator>(0,10);
        std::cout<< U() << std::endl;
    }
    

    Run it here. Hope this helps!

    EDIT My first recommendation was a mistake, and I apologise for that; we can't use a singleton engine like in the tests above, because this would mean that two uniform int distributions would produce the same random sequence. Instead I rely on the fact that std::bind copies the newly-created engine locally in std::function with its own seed, and this yields the expected behaviour; different generators with the same distribution produce different random sequences.

    0 讨论(0)
提交回复
热议问题