Is there any legitimate use for Intel's RDRAND?

问题

Today I thought: well, even if there is great suspicion on RDRAND implementation of NIST SP 800-90A, it is still a hardware implementation of pseudo-random number generator (PRNG) that must be good enough for non-sensitive applications. So I thought of using it on my game instead of Mersenne Twister.

So, to see if there was any performance gain on using the instruction, I compared the time of the two following codes:

// test.cpp
#include <cstdio>

int main()
{
    unsigned int rnd = 0;
    for(int i = 0; i < 10000000; ++i) {
        __builtin_ia32_rdrand32_step(&rnd);
    }
    printf("%x\n", rnd);
}

and

//test2.cpp
#include <cstdio>
#include <random>

int main()
{
    unsigned int rnd = 0;
    __builtin_ia32_rdrand32_step(&rnd);
    std::mt19937 gen(rnd);
    for(int i = 0; i < 10000000; ++i) {
        rnd ^= gen();
    }
    printf("%x\n", rnd);
}

and by running the two I get:

$ time ./test
d230449a

real    0m0.361s
user    0m0.358s
sys     0m0.002s

$ time ./test2 
bfc4e472

real    0m0.051s
user    0m0.050s
sys     0m0.002s

So, Mersenne Twister is much faster than RDRAND on my CPU. Well, I was disappointed, ruled out from my game. But RDRAND is a cryptographically secure PRNG (CSPRNG), so it does much behind the scenes... more fair would be compare it to other CSPRNG. So I took my Rabbit implementation (plain translation of the RFC to C, no fancy tricks for performance), and wrote the following test:

// test3.cpp
#include <cstdio>

extern "C"
{
#include "rabbit.h"
}

int main()
{
    rabbit_state s;
    unsigned long long buf[2];
    __builtin_ia32_rdrand64_step(&buf[0]);
    __builtin_ia32_rdrand64_step(&buf[1]);
    rabbit_init_key(&s, (uint8_t*)&buf[0]);

    for(int i = 0; i < 10000000; ++i) {
        rabbit_extract(&s, (uint8_t*)&buf[0]);
    }
    printf("%llx\n", buf[0]);
}

And for my surprise, generating twice as much pseudo-random data as the first two of them, I got a better time than RDRAND:

$ time ./test3 
8ef9772277b70aba

real    0m0.344s
user    0m0.341s
sys     0m0.002s

All three were compiled with optimization enabled.

So, we have a widespread paranoia that RDRAND was made to embed NSA backdoors into everybody's software cryptography. Also we have at least one software CSPRNG faster than RDRAND, and the most widely used decent PRNG, Mersenne Twister, is much faster than RDRAND. Finally, we have open-source auditable software entropy pools, like /dev/random and /dev/urandom, that are not hidden behind twofold scrambler layers of AES, like RDRAND.

So, the question: should be people be using RDRAND? Is there any legitimate use for it? Or should we stop using it altogether?

回答1:

As pointed out in the other answer, RDRAND is seeded with true randomness. In particular, it frequently reseeds its internal CSPRNG with 128 bits of hardware-generated randomness, guaranteeing a reseed at least once every 511 * 128 bits. See section 4.2.5 of this doc:

https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide

So in your examples, you used a single 128-bit seed to generate 10 million random draws from rabbit_extract. In the RDRAND version, you had the equivalent of 2.5 million 128-bit draws, meaning that the CSPRING was reseeded at least 2,500,000/511 = 4,892 times.

So instead of 128 bits of entropy going into your rabbit example, there were at least 4,892*128 = 626,176 bits of entropy going into the RDRAND example.

That's much, much more entropy than you're going to get in 0.361 seconds without hardware support. That could matter if you're doing stuff where lots of real randomness is important. One example is Shamir secret sharing of large quantities of data -- not sure if there are others.

So in conclusion -- it's not for speed, it's for high security. The question of whether it's backdoored is troubling, of course, but you can always XOR it with other sources, and at the very least it's not hurting you.

回答2:

RDRAND is not just a PRNG. It is a whitened TRNG that is FIPS compliant. The difference is that you can rely on RDRAND to contain quite a lot of actual entropy directly retrieved from the CPU. So the main use of RDRAND is to supply entropy to OS/libraries/applications.

The only other good way for applications to retrieve entropy is usually using an OS supplied entropy source such as /dev/random or /dev/urandom (which usually draws the entropy from /dev/random). However, that OS also requires to find the entropy somewhere. Usually tiny differences in disk and network access times are used for this (+ other semi-random input). These devices are not always present, and are not designed as sources of entropy; they are often not very good sources, nor are they very fast. So on systems that support it, RDRAND is often used as an entropy source for the cryptographically secure random number generator of the operating system.

With regards to speed, especially for games, it is completely valid to use a (non-secure) PRNG. If you want to have a reasonable random seed then seeding it with the result of RDRAND may be a good idea, although seeding it from the OS supplied RNG may be a more portable and even a more secure option (in case you don't fully trust Intel or the US).

Note that currently RDRAND is implemented using (AES) CTR_DRBG instead of a (less well analysed) stream cipher that was created for speed such as Rabbit, so it should come as no surprise that Rabbit is faster. Even more importantly, it also has to retrieve the entropy from the entropy source within the CPU before it can run.

回答3:

Is there any legitimate use for Intel's RDRAND?

Yes.

Consider a Monte-Carlo simulation. It has no cryptographic needs, so it does not matter if its backdoored by the NSA.

Or should we stop using it altogether?

We can't answer that. That's a confluence use cases, requirements and personal preferences.

... Also we have at least one software CSPRNG faster than RDRAND, and the most widely used decent PRNG..."

Mersenne Twister may be faster for a word at at time after initialization and without Twists because its returning a word from the state array. But I doubt its as fast as RDRAND for a continuous stream. I know RDRAND can achieve theoretical limits based on bus width in a continuous stream.

According to David Johnston of Intel (who designed the circuit), that's something like 800+ MB/s. See DJ's answer at What is the latency and throughput of the RDRAND instruction on Ivy Bridge?.

So, we have a widespread paranoia that RDRAND was made to embed NSA backdoors into everybody's software cryptography.

Paranoid folks have at least two choices. First, they can forgo using RDRAND and RDSEED. Second, they can use the output of RDRAND and RDSEED to seed another generator, and then use the output of the second generator. I believe the Linux kernel takes the second approach.

回答4:

There is an astrophysics research paper here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120), (non-paywalled) version here (https://arxiv.org/abs/1707.02212) that gives a legitimate use for RdRand.

It examines the effects of RdRand on a Monte Carlo simulator, as an earlier post advised. But the author didn't find any statistical difference in the results that use or do not use RdRand. From the performance standpoint, it looks like the Mersenne Twister is much faster. I think Sections 2.2.1 and 5 have all of the details.

回答5:

True Randomness vs Pseudorandom.

rdrand is a hardware-source of true entropy, and at that a quite powerfull one (also it is not Intel-exclusive - AMD offers it the same - why do you single out Intel then? well, i'm concentrating on the implementation of Intel as they offer a bit more information and have the vastly higher throughput compared to even Zen2 [ryzen 3xxx])

Forget about the clockcyle-throughput - that is a misleading metric here as it is not really related. The throughput is limited by the roundtrip-delay for a single thread and by the actual hardware implementation running at 800 Mhz, has a fixed 8 cycles and can deliver a 64bit random value per transaction. Looking just at the clock-cycles taken by the thread running the code will very depending on the current clockrate - idling at 800 MHz it would seem like at was 6 times faster than running at 4.8 Ghz. But the delay between the cores and the DRNG is in the order of inter-Core-latency, and you have that twice, so about 80ns - with results in roughly 12MHz@8byte => ~100 MByte/s/Thread and maximum is 800 MB/s.

one other benefit of this: It is Not using up your CPU-Resources like most PRNG would do, so while other generators can have quite a bit higher throughput (1-2 orders of magnitude) they also require more resources. In situations were you need a lot of random numbers and are in a tight computationally heavy loop rdrand might actually give the better performance, but that of course is heavily dependent on the exact scenario and hardware.

If you just need a lot of random numbers for some simulation or need quasi-random numbers for games then rdrand is likely not the best choice. If you need true secure random number? Use it - and combine it with other sources of entropy if you are worried about some backdoors.

来源：https://stackoverflow.com/questions/26771329/is-there-any-legitimate-use-for-intels-rdrand

标签

random

cryptography

stream-cipher

rdrand