Fast Algorithm to Factorize All Numbers Up to a Given Number

问题

I am looking for an algorithm that could factorize numbers based on numbers it already factorized. In other words, I am searching for a fast algorithm to factorise all numbers up to a given number, and store them in a (I guess this is the easiest data structure to use) list / tuple of tuples. I am looking for an "up to n" algorithm because I need all numbers up to "n", and I guess it's faster than just checking one by one.

I want this algorithm to work within a reasonable time (less than an hour) for 2*10^8, for a program I am running. I have tried one of the more naive approaches in python, finding all primes up to "n" first, and then for each number "k" finding it's prime factorization by checking each prime until one divides it (we will call it p), then it's factorization is the factorization of k/p + p.

from math import *
max=1000000 # We will check all numbers up to this number, 

lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
  if lst[i - 2]:
    for j in range(i ** 2, max, i):
      lst[j - 2] = False

PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")

FACTORS = [(0,),(1,)] #This will be a list of tuples where FACTORS[i] = the prime factors of i
for c in range(2,max): #check all numbers until max
  if c in PRIMES:
    FACTORS.append((c,)) #If it's a prime just add it in
  else: #if it's not a prime...
    i=0
    while PRIMES[i]<= c: #Run through all primes until you find one that divides it,
      if c%PRIMES[i] ==0: 
        FACTORS.append(FACTORS[c//PRIMES[i]] + (PRIMES[i],)) #If it does, add the prime with the factors of the division
        break
      i+=1

From testing, the vast majority of time is wasted on the else section AFTER checking if the candidate is prime or not. This takes more than an our for max = 200000000

P.S. - WHAT I'M USING THIS FOR - NOT IMPORTANT

The program I am running this for is to find the smallest "n" such that for a certain "a" such that (2n)!/((n+a)!^2) is a whole number. Basically, I defined a_n = smallest k such that (2k)!/((k+n)!^2) is an integer. turns out, a_1 =0, a_2 = 208, a_3 = 3475, a_4 = 8174, a_5 = 252965, a_6 = 3648835, a_7 = 72286092. By the way, I noticed that a_n + n is squarefree, although can't prove it mathematically. Using Legendre's formula: https://en.wikipedia.org/wiki/Legendre%27s_formula, I wrote this code:

from math import *
from bisect import bisect_right
max=100000000 # We will check all numbers up to this number, 

lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
  if lst[i - 2]:
    for j in range(i ** 2, max, i):
      lst[j - 2] = False

PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
print("START")

def v(p,m):
  return sum([ (floor(m/(p**i))) for i in range(1,1+ceil(log(m,p)))]) #This checks for the max power of prime p, so that p**(v(p,m)) divides factorial(m)

def check(a,n): #This function checks if a number n competes the criteria for a certain a
  if PRIMES[bisect_right(PRIMES, n)]<= n + a: #First, it is obvious that if there is a prime between n+1 and n+a the criteria isn't met
    return False
  i=0
  while PRIMES[i] <= n: #We will run through the primes smaller than n... THIS IS THE ROOM FOR IMPROVEMENT - instead of checking all the primes, check all primes that divide (n+1),(n+2),...,(n+a)
    if v(PRIMES[i],2*n)<2*v(PRIMES[i],n+a): # If any prime divides the denominator more than the numerator, the fraction is obviously not a whole number
      return False
    i+=1
  return True #If for all primes less than n, the numerator has a bigger max power of p than the denominator, the fraction is a whole number.

#Next, is a code that will just make sure that the program runs all numbers in order, and won't repeat anything.

start = 0 #start checking from this value
for a in range(1,20): #check for these values of a.
  j=start
  while not check(a,j):
    if j%100000==0:
      print("LOADING ", j) #just so i know how far the program has gotten.
    j+=1
  print("a-",a," ",j) #We found a number. great. print the result.
  start=j #start from this value again, because the check obviously won't work for smaller values with a higher "a"

回答1:

You can use the first part of your script in order to do that!

Code:

from math import *
import time

MAX = 40000000

t = time.time()
# factors[i] = all the prime factors of i
factors = {}
# Running over all the numbers smaller than sqrt(MAX) since they can be the factors of MAX
for i in range(2, int(sqrt(MAX) + 1)):
    # If this number has already been factored - it is not prime
    if i not in factors:
        # Find all the future numbers that this number will factor
        for j in range(i * 2, MAX, i):
            if j not in factors:
                factors[j] = [i]
            else:
                factors[j].append(i)
print(time.time() - t)

for i in range(3, 15):
    if i not in factors:
        print(f"{i} is prime")
    else:
        print(f"{i}: {factors[i]}")

Result:

3: is prime
4: [2]
5: is prime
6: [2, 3]
7: is prime
8: [2]
9: [3]
10: [2, 5]
11: is prime
12: [2, 3]
13: is prime
14: [2, 7]

Explanation:

As mentioned in the comments it is a modification of the Sieve of Eratosthenes algorithm.
For each number we find all the numbers it can factorize in the future.
If the number does not appear in the result dictionary it is a prime since no number factorize it. We are using dictionary instead of list so the prime numbers will not need to be saved at all - which is a bit more memory friendly but also a bit slower.

Time:

According to a simple check for MAX = 40000000 with time.time(): 110.14351892471313 seconds.
For MAX = 1000000: 1.0785243511199951 seconds.
For MAX = 200000000 with time.time(): Not finished after 1.5 hours... It has reached the 111th item in the main loop out of 6325 items (This is not so bad since the farther the loops go they become shorter).

I do believe however that a well written C code could do it in half an hour (If you are willing to consider it I might write another answer). Some more optimization that can be done is use multithreading and some Primality test like Miller–Rabin. Of course it is worth mentioning that these results are on my laptop and maybe on a PC or a dedicated machine it will run faster or slower.

回答2:

Edit:

I actually asked a question in code review about this answer and it has some cool graphs about the runtime!

Edit #2:

Someone answered my question and now the code can run in 2.5 seconds with some modifications.

Since the previous answer was written in Python it was slow. The following code is doing the exact same but in C++, it has a thread that is monitoring to which prime it got every 10 seconds.

#include <math.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <ctime>
#include <thread>
#include <iostream>
#include <atomic>

#ifndef MAX
#define MAX 200000000
#define TIME 10
#endif


std::atomic<bool> exit_thread_flag{false};

void timer(int *i_ptr) {
    for (int i = 1; !exit_thread_flag; i++) {
        sleep(TIME);
        if (exit_thread_flag) {
            break;
        }
        std::cout << "i = " << *i_ptr << std::endl;
        std::cout << "Time elapsed since start: " 
                  << i * TIME 
                  << " Seconds" << std::endl;
    }
}

int main(int argc, char const *argv[])
{
    int i, upper_bound, j;
    std::time_t start_time;
    std::thread timer_thread;
    std::vector< std::list< int > > factors;

    std::cout << "Initiallizating" << std::endl;
    start_time = std::time(nullptr);
    timer_thread = std::thread(timer, &i);
    factors.resize(MAX);
    std::cout << "Initiallization took " 
              << std::time(nullptr) - start_time 
              << " Seconds" << std::endl;

    std::cout << "Starting calculation" << std::endl;
    start_time = std::time(nullptr);
    upper_bound = sqrt(MAX) + 1;
    for (i = 2; i < upper_bound; ++i)
    {
        if (factors[i].empty())
        {
            for (j = i * 2; j < MAX; j += i)
            {
                factors[j].push_back(i);
            }
        }
    }
    std::cout << "Calculation took " 
              << std::time(nullptr) - start_time 
              << " Seconds" << std::endl;

    // Closing timer thread
    exit_thread_flag = true;

    std::cout << "Validating results" << std::endl;
    for (i = 2; i < 20; ++i)
    {
        std::cout << i << ": ";
        if (factors[i].empty()) {
            std::cout << "Is prime";
        } else {
            for (int v : factors[i]) {
                std::cout << v << ", ";
            }
        }
        std::cout << std::endl;
    }
    
    timer_thread.join();
    return 0;
}

It needs to be compiled with the line:

g++ main.cpp -std=c++0x -pthread

If you do not want to turn your entire code to C++ you can use the subprocess library in Python.

Time:

Well I tried my best but it still runs in over an hour... it has reached 6619 which is the 855th prime (Much better!) in 1.386111 hours (4990 seconds). So it is an improvement but there is still some way to go! (It might be faster without another thread)

来源：https://stackoverflow.com/questions/62698250/fast-algorithm-to-factorize-all-numbers-up-to-a-given-number

标签

python

algorithm

prime-factoring