Is 1.0 a valid output from std::generate_canonical?

后端 未结 3 1775
孤城傲影
孤城傲影 2020-12-01 00:43

I always thought random numbers would lie between zero and one, without 1, i.e. they are numbers from the half-open interval [0,1). The documen

相关标签:
3条回答
  • 2020-12-01 01:04

    I just ran into a similar question with uniform_real_distribution, and here's how I interpret the Standard's parsimonious wording on the subject:

    The Standard always defines math functions in terms of math, never in terms of IEEE floating-point (because the Standard still pretends that floating-point might not mean IEEE floating point). So, any time you see mathematical wording in the Standard, it's talking about real math, not IEEE.

    The Standard says that both uniform_real_distribution<T>(0,1)(g) and generate_canonical<T,1000>(g) should return values in the half-open range [0,1). But these are mathematical values. When you take a real number in the half-open range [0,1) and represent it as IEEE floating-point, well, a significant fraction of the time it will round up to T(1.0).

    When T is float (24 mantissa bits), we expect to see uniform_real_distribution<float>(0,1)(g) == 1.0f about 1 in 2^25 times. My brute-force experimentation with libc++ confirms this expectation.

    template<class F>
    void test(long long N, const F& get_a_float) {
        int count = 0;
        for (long long i = 0; i < N; ++i) {
            float f = get_a_float();
            if (f == 1.0f) {
                ++count;
            }
        }
        printf("Expected %d '1.0' results; got %d in practice\n", (int)(N >> 25), count);
    }
    
    int main() {
        std::mt19937 g(std::random_device{}());
        auto N = (1uLL << 29);
        test(N, [&g]() { return std::uniform_real_distribution<float>(0,1)(g); });
        test(N, [&g]() { return std::generate_canonical<float, 32>(g); });
    }
    

    Example output:

    Expected 16 '1.0' results; got 19 in practice
    Expected 16 '1.0' results; got 11 in practice
    

    When T is double (53 mantissa bits), we expect to see uniform_real_distribution<double>(0,1)(g) == 1.0 about 1 in 2^54 times. I don't have the patience to test this expectation. :)

    My understanding is that this behavior is fine. It may offend our sense of "half-open-rangeness" that a distribution claiming to return numbers "less than 1.0" can in fact return numbers that are equal to 1.0; but those are two different meanings of "1.0", see? The first is the mathematical 1.0; the second is the IEEE single-precision floating-point number 1.0. And we've been taught for decades not to compare floating-point numbers for exact equality.

    Whatever algorithm you feed the random numbers into isn't going to care if it sometimes gets exactly 1.0. There's nothing you can do with a floating-point number except mathematical operations, and as soon as you do some mathematical operation, your code will have to deal with rounding. Even if you could legitimately assume that generate_canonical<float,1000>(g) != 1.0f, you still wouldn't be able to assume that generate_canonical<float,1000>(g) + 1.0f != 2.0f — because of rounding. You just can't get away from it; so why would we pretend in this single instance that you can?

    0 讨论(0)
  • 2020-12-01 01:17

    The problem is in mapping from the codomain of std::mt19937 (std::uint_fast32_t) to float; the algorithm described by the standard gives incorrect results (inconsistent with its description of the output of the algorithm) when loss of precision occurs if the current IEEE754 rounding mode is anything other than round-to-negative-infinity (note that the default is round-to-nearest).

    The 7549723rd output of mt19937 with your seed is 4294967257 (0xffffffd9u), which when rounded to 32-bit float gives 0x1p+32, which is equal to the max value of mt19937, 4294967295 (0xffffffffu) when that is also rounded to 32-bit float.

    The standard could ensure correct behavior if it were to specify that when converting from the output of the URNG to the RealType of generate_canonical, rounding is to be performed towards negative infinity; this would give a correct result in this case. As QOI, it would be good for libstdc++ to make this change.

    With this change, 1.0 will no longer be generated; instead the boundary values 0x1.fffffep-N for 0 < N <= 8 will be generated more often (approximately 2^(8 - N - 32) per N, depending on the actual distribution of MT19937).

    I would recommend to not use float with std::generate_canonical directly; rather generate the number in double and then round towards negative infinity:

        double rd = std::generate_canonical<double,
            std::numeric_limits<float>::digits>(rng);
        float rf = rd;
        if (rf > rd) {
          rf = std::nextafter(rf, -std::numeric_limits<float>::infinity());
        }
    

    This problem can also occur with std::uniform_real_distribution<float>; the solution is the same, to specialize the distribution on double and round the result towards negative infinity in float.

    0 讨论(0)
  • 2020-12-01 01:17

    According to the standard, 1.0 is not valid.

    C++11 §26.5.7.2 Function template generate_canonical

    Each function instantiated from the template described in this section 26.5.7.2 maps the result of one or more invocations of a supplied uniform random number generator g to one member of the specified RealType such that, if the values gi produced by g are uniformly distributed, the instantiation’s results tj , 0 ≤ tj < 1, are distributed as uniformly as possible as specified below.

    0 讨论(0)
提交回复
热议问题