Enumerate factors of a number directly in ascending order without sorting?

前端未结

关注

 4  1269

野趣味 2020-11-27 20:00

Is there an efficient algorithm to enumerate the factors of a number n, in ascending order, without sorting? By “efficient” I mean:

The algorithm a

4条回答

囚心锁ツ (楼主)

2020-11-27 20:41

[I'm posting this answer just as a formality for completeness. I've already chosen someone else's answer as the accepted answer.]

Overview of algorithm. In searching for the fastest way to generate an in-memory list of factors (64-bit unsigned values in my case), I settled upon a hybrid algorithm that implements a two-dimensional bucket sort, which takes advantage of the internal knowledge of the sort keys (i.e., they are just integers and can therefore be computed upon). The specific method is something closer to a “ProxMapSort” but with two levels of keys (major, minor) instead of just one. The major key is simply the base-2 logarithm of the value. The minor key is the minimal number of most significant digits of the value needed to produce a reasonable spread at the second layer of buckets. Factors are produced first into a temporary work array of unsorted factors. Next, their distribution is analyzed and an array of bucket indexes is allocated and populated. Finally, the factors are stored directly into place in the final sorted array, using insertion sort. The vast majority of buckets have only 1, 2, or 3 values. Examples are given in the source code, which is attached at the bottom of this answer.

Spatial complexity. Memory utilization is approximately 4x that of a Quicksort-based solution, but this is actually rather insignificant, as the maximum memory ever used in the worst case (for 64-bit input) is 5.5 MB, of which 4.0 MB is held for only a small handful of milliseconds.

Runtime complexity. Performance is far better than a hand-coded Quicksort-based solution: for numbers with a nontrivial number of factors, it is unformly about 2.5x times faster. On my 3.4 GHz. Intel i7, it produces the 184,320 factors of 18,401,055,938,125,660,800 in sorted order in 0.0052 seconds, or about 96 clock cycles per factor, or about 35 million factors per second.

Graphs. Memory and runtime performance were profiled for the 47,616 canonical representatives of the equivalance classes of prime signatures of numbers up to 2⁶⁴–1. These are the so-called “highly factorable numbers” in 64-bit search space.

Total runtime is ~2.5x better than a Quicksort-based solution for nontrivial factor counts, shown below on this log–log plot:

Total Runtime

The number of sorted factors produced per second is essentially the inversion of the above. Performance on a per-factor basis declines after the sweet spot of approximately 2000 factors, but not by much. Performance is affected by L1, L2, and L3 cache sizes, as well as the count of unique prime factors of the number being factored, which goes up roughly with the logarithm of the input value.

Sorted Factors Per Second

Peak memory usage is a straight line in this log–log plot, since it is proportional to the base-2 logarithm of the number of factors. Note that peak memory usage is only for a very brief period of time; short-lived work arrays are discarded within milliseconds. After the temporary arrays are discarded, what remains is the final list of factors, which is the same minimal usage as seen in the Quicksort-based solution.

Memory Usage

Source Code. Attached below is a proof-of-concept program in the C programming language (specifically, C11). It has been tested on x86-64 with Clang/LLVM, although it should work fine with GCC as well.

 /*==============================================================================

 DESCRIPTION

    This is a small proof-of-concept program to test the idea of "sorting"
    factors using a form of bucket sort.  The method is essentially a 2D version
    of ProxMapSort that has tuned for vast, nonlinear distributions using two
    keys (major, minor) rather than one.  The major key is simply the floor of
    the base-2 logarithm of the value, and the minor key is derived from the most
    significant bits of the value.


 INPUT

    Input is given on the command line, either as a single argument giving the
    number to be factored or an even number of arguments giving the 2-tuples that
    comprise the prime-power factorization of the desired number.  For example,
    the number

       75600 = 2^4 x 3^3 x 5^2 x 7

    can be given by the following list of arguments:

       2 4 3 3 5 2 7 1

    Note:  If a single number is given, it will require factoring to produce its
    prime-power factorization.  Since this is just a small test program, a very
    crude factoring method is used that is extremely fast for small prime factors
    but extremely slow for large prime factors.  This is actually fine, because
    the largest factor lists occur with small prime factors anyway, and it is the
    production of large factor lists at which this program aims to be proficient.
    It is simply not interesting to be fast at producing the factor list of a
    number like 17293823921105882610 = 2 x 3 x 5 x 576460797370196087, because
    it has only 32 factors.  Numbers with tens or hundreds of thousands of
    factors are much more interesting.


 OUTPUT

    Results are written to standard output.  A list of factors in ascending order
    is produced, followed by runtime (in microseconds) required to generate the
    list (not including time to print it).


 STATISTICS

    Bucket size statistics for the 47616 canonical representatives of the prime
    signature equivalence classes of 64-bit numbers:

    ==============================================================
    Bucket size     Total count of factored       Total count of
         b          numbers needing size b      buckets of size b
    --------------------------------------------------------------
         1               47616 (100.0%)         514306458  (76.2%)
         2               47427  (99.6%)         142959971  (21.2%)
         3               43956  (92.3%)          16679329   (2.5%)
         4               27998  (58.8%)            995458   (0.1%)
         5                6536  (13.7%)             33427  (<0.1%)
         6                 400   (0.8%)               729  (<0.1%)
         7                  12  (<0.1%)                18  (<0.1%)
    --------------------------------------------------------------
         ~               47616 (100.0%)         674974643 (100.0%)
    --------------------------------------------------------------

    Thus, no 64-bit number (of the input set) ever requires more than 7 buckets,
    and the larger the bucket size the less frequent it is.  This is highly
    desirable.  Note that although most numbers need at least 1 bucket of size 5,
    the vast majority of buckets (99.9%) are of size 1, 2, or 3, meaning that
    insertions are extremely efficient.  Therefore, the use of insertion sort
    for the buckets is clearly the right choice and is arguably optimal for
    performance.


 AUTHOR

    Todd Lehman
    2015/05/08

 */

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 typedef  unsigned int  uint;
 typedef  uint8_t       uint8;
 typedef  uint16_t      uint16;
 typedef  uint32_t      uint32;
 typedef  uint64_t      uint64;

 #define  ARRAY_CAPACITY(x)  (sizeof(x) / sizeof((x)[0]))

 //-----------------------------------------------------------------------------
 // This structure is sufficient to represent the prime-power factorization of
 // all 64-bit values.  The field names ω and Ω are dervied from the standard
 // number theory functions ω(n) and Ω(n), which count the number of unique and
 // non-unique prime factors of n, respectively.  The field name d is derived
 // from the standard number theory function d(n), which counts the number of
 // divisors of n, including 1 and n.
 //
 // The maximum possible value here of ω is 15, which occurs for example at
 // n = 7378677391061896920 = 2^3 x 3^2 x 5 x 7 x 11 x 13 x 17 x 19 x 23 x 29
 // 31 x 37 x 41 x 43 x 47, which has 15 unique prime factors.
 //
 // The maximum possible value of Ω here is 63, which occurs for example at
 // n = 2^63 and n = 2^62 x 3, both of which have 63 non-unique prime factors.
 //
 // The maximum possible value of d here is 184320, which occurs at
 // n = 18401055938125660800 = 2^7 x 3^4 x 5^2 x 7^2 x 11 x 13 x 17 x 19 x 23 x
 // 29 x 31 x 37 x 41.
 //
 // Maximum possible exponents when exponents are sorted in decreasing order:
 //
 //    Index   Maximum   Bits   Example of n
 //    -----   -------   ----   --------------------------------------------
 //        0        63      6   (2)^63
 //        1        24      5   (2*3)^24
 //        2        13      4   (2*3*5)^13
 //        3         8      4   (2*3*5*7)^8
 //        4         5      3   (2*3*5*7*11)^5
 //        5         4      3   (2*3*5*7*11*13)^4
 //        6         3      2   (2*3*5*7*11*13*17)^3
 //        7         2      2   (2*3*5*7*11*13*17*19)^2
 //        8         2      2   (2*3*5*7*11*13*17*19*23)^2
 //        9         1      1   (2*3*5*7*11*13*17*19*23*29)^1
 //       10         1      1   (2*3*5*7*11*13*17*19*23*29*31)^1
 //       11         1      1   (2*3*5*7*11*13*17*19*23*29*31*37)^1
 //       12         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41)^1
 //       13         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41*43)^1
 //       14         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41*43*47)^1
 //    -----   -------   ----   --------------------------------------------
 //       15        63     37
 //
 #pragma pack(push, 8)
 typedef struct
 {
   uint8   e[16];  // Exponents.
   uint64  p[16];  // Primes in increasing order.
   uint8   ω;      // Count of prime factors without multiplicity.
   uint8   Ω;      // Count of prime factors with multiplicity.
   uint32  d;      // Count of factors of n, including 1 and n.
   uint64  n;      // Value of n on which all other fields of this struct depend.
 }
 PrimePowerFactorization;  // 176 bytes with 8-byte packing
 #pragma pack(pop)

 #define  MAX_ω  15
 #define  MAX_Ω  63

 //-----------------------------------------------------------------------------
 // Fatal error:  print error message and abort.

 void fatal_error(const char *format, ...)
 {
   va_list args;
   va_start(args, format);
   vfprintf(stderr, format, args);
   exit(1);
 }

 //-----------------------------------------------------------------------------
 // Compute 64-bit 2-adic integer inverse.

 uint64 uint64_inv(const uint64 x)
 {
   assert(x != 0);

   uint64 y = 1;
   for (uint i = 0; i < 6; i++)  // 6 = log2(log2(2**64)) = log2(64)
     y = y * (2 - (x * y));

   return y;
 }

 //------------------------------------------------------------------------------
 // Compute 2 to arbitrary power.  This is just a portable and abstract way to
 // write a left-shift operation.  Note that the use of the UINT64_C macro here
 // is actually required, because the result of 1U< 0.  Uses fast
 // intrinsic function if available; otherwise resorts to hand-rolled method.

 static inline
 uint uint64_log2(uint64 x)
 {
   assert(x > 0);

   #if defined(UINT64_CLZ)
     return 63 - UINT64_CLZ(x);
   #else
     #define S(k) if ((x >> k) != 0) { y += k; x >>= k; }
     uint y = 0; S(32); S(16); S(8); S(4); S(2); S(1); return y;
     #undef S
   #endif
 }

 //------------------------------------------------------------------------------
 // Compute major key, given a nonzero number.  The major key is simply the
 // floor of the base-2 logarithm of the number.

 static inline
 uint major_key(const uint64 n)
 {
   assert(n > 0);
   uint k1 = uint64_log2(n);
   return k1;
 }

 //------------------------------------------------------------------------------
 // Compute minor key, given a nonzero number, its major key, k1, and the
 // bit-size b of major bucket k1.  The minor key, k2, is is computed by first
 // removing the most significant 1-bit from the number, because it adds no
 // information, and then extracting the desired number of most significant bits
 // from the remainder.  For example, given the number n=1463 and a major bucket
 // size of b=6 bits, the keys are computed as follows:
 //
 //    Step 0:  Given number              n = 0b10110110111 = 1463
 //
 //    Step 1:  Compute major key:        k1 = floor(log_2(n)) = 10
 //
 //    Step 2:  Remove high-order 1-bit:  n' = 0b0110110111 = 439
 //
 //    Step 3:  Compute minor key:        k2 = n' >> (k1 - b)
 //                                          = 0b0110110111 >> (10 - 6)
 //                                          = 0b0110110111 >> 4
 //                                          = 0b011011
 //                                          = 27

 static inline
 uint minor_key(const uint64 n, const uint k1, const uint b)
 {
   assert(n > 0); assert(k1 >= 0); assert(b > 0);
   const uint k2 = (uint)((n ^ uint64_pow2(k1)) >> (k1 - b));
   return k2;
 }

 //------------------------------------------------------------------------------
 // Raw unsorted factor.

 #pragma push(pack, 4)

 typedef struct
 {
   uint64  n;   // Value of factor.
   uint32  k1;  // Major key.
   uint32  k2;  // Minor key.
 }
 UnsortedFactor;

 #pragma pop(pack)

 //------------------------------------------------------------------------------
 // Compute sorted list of factors, given a prime-power factorization.

 static uint64 memory_usage;

 uint64 *compute_factors(const PrimePowerFactorization ppf)
 {
   memory_usage = 0;

   if (ppf.n == 0)
     return NULL;

   uint64 *sorted_factors = calloc(ppf.d, sizeof(*sorted_factors));
   if (!sorted_factors)
     fatal_error("Failed to allocate array of %"PRIu32" factors.", ppf.d);
   memory_usage += ppf.d * sizeof(*sorted_factors);

   UnsortedFactor *unsorted_factors = malloc(ppf.d * sizeof(*unsorted_factors));
   if (!unsorted_factors)
     fatal_error("Failed to allocate array of %"PRIu32" factors.", ppf.d);
   memory_usage += ppf.d * sizeof(*unsorted_factors);


   // These arrays are indexed by the major key of a number.
   uint32 major_counts[64];   // Counts of factors in major buckets.
   uint32 major_spans[64];    // Counts rounded up to power of 2.
   uint32 major_bits[64];     // Base-2 logarithm of bucket size.
   uint32 major_indexes[64];  // Indexes into minor array.
   memset(major_counts,  0, sizeof(major_counts));
   memset(major_spans,   0, sizeof(major_spans));
   memset(major_bits,    0, sizeof(major_bits));
   memset(major_indexes, 0, sizeof(major_indexes));


   // --- Step 1:  Produce unsorted list of factors from prime-power
   //     factorization.  At the same time, count groups of factors by their
   //     major keys.
   {
     // This array is for counting in the multi-radix number system dictated by
     // the exponents of the prime-power factorization.  An invariant is that
     // e[i] <= ppf.e[i] for all i (0 < i >= ppf.e[i];  // Divide n by 2 ** ppf.e[i].
           else
             n *= pe_inv[i];  // Divide n by ppf.p[i] ** ppf.e[i].

           e[i] = 0;
         }
         else  // Carrying is not occurring.
         {
           n *= ppf.p[i];
           e[i] += 1;
           break;
         }
       }
     }
     assert(n == 1);  // n always cycles back to 1, not to ppf.n.

     assert(unsorted_factors[ppf.d-1].n == ppf.n);
   }


   // --- Step 2:  Define the major bits array, the major spans array, the major
   //     index array, and count the total spans.

   uint32 total_spans = 0;
   {
     uint32 k = 0;
     for (uint k1 = 0; k1 < ARRAY_CAPACITY(major_counts); k1++)
     {
       uint32 count = major_counts[k1];
       uint32 bits = (count <= 1)? count : uint64_log2(count - 1) + 1;
       major_bits[k1] = bits;
       major_spans[k1] = (count > 0)? (UINT32_C(1) << bits) : 0;
       major_indexes[k1] = k;
       k += major_spans[k1];
     }
     total_spans = k;
   }


   // --- Step 3:  Allocate and populate the minor counts array.  Note that it
   //     must be initialized to zero.

   uint32 *minor_counts = calloc(total_spans, sizeof(*minor_counts));
   if (!minor_counts)
     fatal_error("Failed to allocate array of %"PRIu32" counts.", total_spans);
   memory_usage += total_spans * sizeof(*minor_counts);

   for (uint k = 0; k < ppf.d; k++)
   {
     const uint64 n = unsorted_factors[k].n;
     const uint k1 = unsorted_factors[k].k1;
     const uint k2 = minor_key(n, k1, major_bits[k1]);
     assert(k2 < major_spans[k1]);
     unsorted_factors[k].k2 = k2;
     minor_counts[major_indexes[k1] + k2] += 1;
   }


   // --- Step 4:  Define the minor indexes array.
   //
   // NOTE:  Instead of allocating a separate array, the earlier-allocated array
   // of minor indexes is simply repurposed here using an alias.

   uint32 *minor_indexes = minor_counts;  // Alias the array for repurposing.

   {
     uint32 k = 0;
     for (uint i = 0; i < total_spans; i++)
     {
       uint32 count = minor_counts[i];  // This array is the same array...
       minor_indexes[i] = k;            // ...as this array.
       k += count;
     }
   }


   // --- Step 5:  Populate the sorted factors array.  Note that the array must
   //              be initialized to zero earlier because values of zero are used
   //              as sentinels in the bucket lists.

   for (uint32 i = 0; i < ppf.d; i++)
   {
     uint64 n = unsorted_factors[i].n;
     const uint k1 = unsorted_factors[i].k1;
     const uint k2 = unsorted_factors[i].k2;

     // Insert factor into bucket using insertion sort (which happens to be
     // extremely fast because we know the bucket sizes are always very small).
     uint32 k;
     for (k = minor_indexes[major_indexes[k1] + k2];
          sorted_factors[k] != 0;
          k++)
     {
       assert(k < ppf.d);
       if (sorted_factors[k] > n)
         { uint64 t = sorted_factors[k]; sorted_factors[k] = n; n = t; }
     }
     sorted_factors[k] = n;
   }


   // --- Step 6:  Validate array of sorted factors.
   {
     for (uint32 k = 1; k < ppf.d; k++)
     {
       if (sorted_factors[k] == 0)
         fatal_error("Produced a factor of 0 at index %"PRIu32".", k);

       if (ppf.n % sorted_factors[k] != 0)
         fatal_error("Produced non-factor %"PRIu64" at index %"PRIu32".",
                     sorted_factors[k], k);

       if (sorted_factors[k-1] == sorted_factors[k])
         fatal_error("Duplicate factor %"PRIu64" at index %"PRIu32".",
                     sorted_factors[k], k);

       if (sorted_factors[k-1] > sorted_factors[k])
         fatal_error("Out-of-order factors %"PRIu64" and %"PRIu64" "
                     "at indexes %"PRIu32" and %"PRIu32".",
                     sorted_factors[k-1], sorted_factors[k], k-1, k);
     }
   }


   free(minor_counts);
   free(unsorted_factors);

   return sorted_factors;
 }

 //------------------------------------------------------------------------------
 // Compute prime-power factorization of a 64-bit value.  Note that this function
 // is designed to be fast *only* for numbers with very simple factorizations,
 // e.g., those that produce large factor lists.  Do not attempt to factor
 // large semiprimes with this function.  (The author does know how to factor
 // large numbers efficiently; however, efficient factorization is beyond the
 // scope of this small test program.)

 PrimePowerFactorization compute_ppf(const uint64 n)
 {
   PrimePowerFactorization ppf;

   if (n == 0)
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
   }
   else if (n == 1)
   {
     ppf = (PrimePowerFactorization){ .p = { 1 }, .e = { 1 },
                                      .ω = 1, .Ω = 1, .d = 1, .n = 1 };
   }
   else
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = n };

     uint64 m = n;
     for (uint64 p = 2; p * p <= m; p += 1 + (p > 2))
     {
       if (m % p == 0)
       {
         assert(ppf.ω <= MAX_ω);
         ppf.p[ppf.ω] = p;
         ppf.e[ppf.ω] = 0;
         while (m % p == 0)
           { m /= p; ppf.e[ppf.ω] += 1; }
         ppf.d *= (1 + ppf.e[ppf.ω]);
         ppf.Ω += ppf.e[ppf.ω];
         ppf.ω += 1;
       }
     }
     if (m > 1)
     {
       assert(ppf.ω <= MAX_ω);
       ppf.p[ppf.ω] = m;
       ppf.e[ppf.ω] = 1;
       ppf.d *= 2;
       ppf.Ω += 1;
       ppf.ω += 1;
     }
   }

   return ppf;
 }

 //------------------------------------------------------------------------------
 // Parse prime-power factorization from a list of ASCII-encoded base-10 strings.
 // The values are assumed to be 2-tuples (p,e) of prime p and exponent e.
 // Primes must not exceed 2^64 - 1.  Exponents must not exceed 2^8 - 1.  The
 // constructed value must not exceed 2^64 - 1.

 PrimePowerFactorization parse_ppf(const uint pairs, const char *const values[])
 {
   assert(pairs <= MAX_ω);

   PrimePowerFactorization ppf;

   if (pairs == 0)
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
   }
   else
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = 1 };

     for (uint i = 0; i < pairs; i++)
     {
       ppf.p[i] = (uint64)strtoumax(values[(i*2)+0], NULL, 10);
       ppf.e[i] =  (uint8)strtoumax(values[(i*2)+1], NULL, 10);

       // Validate prime value.
       if (ppf.p[i] < 2)  // (Ideally this would actually do a primality test.)
         fatal_error("Factor %"PRIu64" is invalid.", ppf.p[i]);

       // Accumulate count of unique prime factors.
       if (ppf.ω > UINT8_MAX - 1)
         fatal_error("Small-omega overflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.ω += 1;

       // Accumulate count of total prime factors.
       if (ppf.Ω > UINT8_MAX - ppf.e[i])
         fatal_error("Big-omega wverflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.Ω += ppf.e[i];

       // Accumulate total divisor count.
       if (ppf.d > UINT32_MAX / (1 + ppf.e[i]))
         fatal_error("Divisor count overflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.d *= (1 + ppf.e[i]);

       // Accumulate value.
       for (uint8 k = 1; k <= ppf.e[i]; k++)
       {
         if (ppf.n > UINT64_MAX / ppf.p[i])
           fatal_error("Value overflow at factor %"PRIu64".", ppf.p[i]);
         ppf.n *= ppf.p[i];
       }
     }
   }

   return ppf;
 }

 //------------------------------------------------------------------------------
 // Main control.  Parse command line and produce list of factors.

 int main(const int argc, const char *const argv[])
 {
   PrimePowerFactorization ppf;

   uint values = (uint)argc - 1;  // argc is always guaranteed to be at least 1.

   if (values == 1)
   {
     ppf = compute_ppf((uint64)strtoumax(argv[1], NULL, 10));
   }
   else
   {
     if (values % 2 != 0)
       fatal_error("Odd number of arguments (%u) given.", values);
     uint pairs = values / 2;
     ppf = parse_ppf(pairs, &argv[1]);
   }

   // Run for (as close as possible to) a fixed amount of time, tallying the
   // elapsed CPU time.
   uint64 iterations = 0;
   double cpu_time = 0.0;
   const double cpu_time_limit = 0.05;
   while (cpu_time < cpu_time_limit)
   {
     clock_t clock_start = clock();
     uint64 *factors = compute_factors(ppf);
     clock_t clock_end = clock();
     cpu_time += (double)(clock_end - clock_start) / (double)CLOCKS_PER_SEC;

     if (++iterations == 1)
     {
       for (uint32 i = 0; i < ppf.d; i++)
         printf("%"PRIu64"\n", factors[i]);
     }

     if (factors) free(factors);
   }

   // Print the average amount of CPU time required for each iteration.
   uint mem_scale = (memory_usage >= 1e9)? 9:
                    (memory_usage >= 1e6)? 6:
                    (memory_usage >= 1e3)? 3:
                                           0;
   char *mem_units = (mem_scale == 9)? "GB":
                     (mem_scale == 6)? "MB":
                     (mem_scale == 3)? "KB":
                                        "B";

   printf("%"PRIu64"  %"PRIu32" factors  %.6f ms  %.3f ns/factor  %.3f %s\n",
          ppf.n,
          ppf.d,
          cpu_time/iterations * 1e3,
          cpu_time/iterations * 1e9 / (double)(ppf.d? ppf.d : 1),
          (double)memory_usage / pow(10, mem_scale),
          mem_units);

   return 0;
 }

0 讨论(0)

查看其它4个回答