Sieve of Eratosthenes has huge 'overdraw' - is Sundaram's better after all?

蹲街弑〆低调 提交于 2019-12-04 12:38:22

Since there weren't any takers for the 'Sundaram vs. Eratosthenes' problem, I sat down and analysed it. Result: classic Sundaram's has strictly higher overdraw than an odds-only Eratosthenes; if you apply an obvious, small optimisation then the overdraw is exactly the same - for reasons that shall become obvious. If you fix Sundaram's to avoid overdraw entirely then you get something like Pritchard's Sieve, which is massively more complicated.

The exposition of Sundaram's Sieve in Lucky's Notes is probably the best by far; slightly rewritten to use a hypothetical (i.e. non-standard and not supplied here) type bitmap_t it looks somewhat like this. In order to measure overdraw the bitmap type needs an operation corresponding to the BTS (bit test and set) CPU instruction, which is available via the _bittestandset() intrinsic with Wintel compilers and with MinGW editions of gcc. The intrinsics are very bad for performance but very convenient for counting overdraw.

Note: for sieving all primes up to N one would call the sieve with max_bit = N/2; if bit i of the resulting bitmap is set then the number (2 * i + 1) is composite. The function has '31' in its name because the index math breaks for bitmaps greater than 2^31; hence this code can only sieve numbers up to 2^32-1 (corresponding to max_bit <= 2^31-1).

uint64_t Sundaram31_a (bitmap_t &bm, uint32_t max_bit)
{
   assert( max_bit <= UINT32_MAX / 2 );

   uint32_t m = max_bit;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i < m / 2; ++i)
   {
      for (uint32_t j = i; j <= (m - i) / (2 * i + 1); ++j)
      {
         uint32_t k = i + j + 2 * i * j;

         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

Lucky's bound on j is exact but the one on i is very loose. Tightening it and losing the m alias that I had added to make the code look more like common expositions on the net, we get:

uint64_t Sundaram31_b (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = uint32_t(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      for (uint32_t j = i; j <= (max_bit - i) / (2 * i + 1); ++j)
      {
         uint32_t k = i + j + 2 * i * j;

         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

The assert was dumped in order to reduce noise but it is actually still valid and necessary. Now it's time for a bit of strength reduction, turning multiplication into iterated addition:

uint64_t Sundaram31_c (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = uint32_t(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      uint32_t n = 2 * i + 1;
      uint32_t k = n * i + i;   // <= max_bit because that's how we computed i_max
      uint32_t j_max = (max_bit - i) / n;

      for (uint32_t j = i; j <= j_max; ++j, k += n)
      {
         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

Transforming the loop condition to use k allows us to lose j; things should be looking exceedingly familiar by now...

uint64_t Sundaram31_d (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = uint32_t(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      uint32_t n = 2 * i + 1;     
      uint32_t k = n * i + i;   // <= max_bit because that's how we computed i_max

      for ( ; k <= max_bit; k += n)
      {        
         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

With things looking the way they do, it's time to analyse whether a certain obvious small change is warranted by the math. The proof is left as an exercise for the reader...

uint64_t Sundaram31_e (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = unsigned(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      if (bm.bt(i))  continue;

      uint32_t n = 2 * i + 1;     
      uint32_t k = n * i + i;   // <= m because we computed i_max to get this bound

      for ( ; k <= max_bit; k += n)
      {        
         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

The only thing that still differs from classic odds-only Eratosthenes (apart from the name) is the initial value for k, which is normally (n * n) / 2 for the old Greek. However, substituting 2 * i + 1 for n the difference turns out to be 1/2, which rounds to 0. Hence, Sundaram's is odds-only Eratosthenes without the 'optimisation' of skipping composites to avoid at least some crossings-out of already crossed-out numbers. The value for i_max is the same as the Greek's max_factor_bit, just arrived at using completely different logical steps and computed using a marginally different formula.

P.S.: after seeing overdraw in the code so many times, people will probably want to know what it actually is... Sieving the numbers up to 2^32-1 (i.e. a full 2^31 bit bitmap) Sundaram's has an overdraw of 8,643,678,027 (roughly 2 * 2^32) or 444.6%; with the small fix that turns it into odds-only Eratosthenes the overdraw becomes 2,610,022,328 or 134.2%.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!