How to create the most compact mapping n → isprime(n) up to a limit N?

后端未结

关注

 30  3366

遇见更好的自我 2020-11-22 02:11

Naturally, for bool isprime(number) there would be a data structure I could query.
I define the best algorithm, to be the algorithm that pr

30条回答

庸人自扰 (楼主)

2020-11-22 02:45

I compared the efficiency of the most popular suggestions to determine if a number is prime. I used python 3.6 on ubuntu 17.10; I tested with numbers up to 100.000 (you can test with bigger numbers using my code below).

This first plot compares the functions (which are explained further down in my answer), showing that the last functions do not grow as fast as the first one when increasing the numbers.

And in the second plot we can see that in case of prime numbers the time grows steadily, but non-prime numbers do not grow so fast in time (because most of them can be eliminated early on).

Here are the functions I used:

this answer and this answer suggested a construct using all():

def is_prime_1(n):
    return n > 1 and all(n % i for i in range(2, int(math.sqrt(n)) + 1))

This answer used some kind of while loop:

def is_prime_2(n):
    if n <= 1:
        return False
    if n == 2:
        return True
    if n == 3:
        return True
    if n % 2 == 0:
        return False
    if n % 3 == 0:
        return False

    i = 5
    w = 2
    while i * i <= n:
        if n % i == 0:
            return False
        i += w
        w = 6 - w

    return True

This answer included a version with a for loop:

def is_prime_3(n):
    if n <= 1:
        return False

    if n % 2 == 0 and n > 2:
        return False

    for i in range(3, int(math.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False

    return True

And I mixed a few ideas from the other answers into a new one:

def is_prime_4(n):
    if n <= 1:          # negative numbers, 0 or 1
        return False
    if n <= 3:          # 2 and 3
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False

    for i in range(5, int(math.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False

    return True

Here is my script to compare the variants:

import math
import pandas as pd
import seaborn as sns
import time
from matplotlib import pyplot as plt


def is_prime_1(n):
    ...
def is_prime_2(n):
    ...
def is_prime_3(n):
    ...
def is_prime_4(n):
    ...

default_func_list = (is_prime_1, is_prime_2, is_prime_3, is_prime_4)

def assert_equal_results(func_list=default_func_list, n):
    for i in range(-2, n):
        r_list = [f(i) for f in func_list]
        if not all(r == r_list[0] for r in r_list):
            print(i, r_list)
            raise ValueError
    print('all functions return the same results for integers up to {}'.format(n))

def compare_functions(func_list=default_func_list, n):
    result_list = []
    n_measurements = 3

    for f in func_list:
        for i in range(1, n + 1):
            ret_list = []
            t_sum = 0
            for _ in range(n_measurements):
                t_start = time.perf_counter()
                is_prime = f(i)
                t_end = time.perf_counter()

                ret_list.append(is_prime)
                t_sum += (t_end - t_start)

            is_prime = ret_list[0]
            assert all(ret == is_prime for ret in ret_list)
            result_list.append((f.__name__, i, is_prime, t_sum / n_measurements))

    df = pd.DataFrame(
        data=result_list,
        columns=['f', 'number', 'is_prime', 't_seconds'])
    df['t_micro_seconds'] = df['t_seconds'].map(lambda x: round(x * 10**6, 2))
    print('df.shape:', df.shape)

    print()
    print('', '-' * 41)
    print('| {:11s} | {:11s} | {:11s} |'.format(
        'is_prime', 'count', 'percent'))
    df_sub1 = df[df['f'] == 'is_prime_1']
    print('| {:11s} | {:11,d} | {:9.1f} % |'.format(
        'all', df_sub1.shape[0], 100))
    for (is_prime, count) in df_sub1['is_prime'].value_counts().iteritems():
        print('| {:11s} | {:11,d} | {:9.1f} % |'.format(
            str(is_prime), count, count * 100 / df_sub1.shape[0]))
    print('', '-' * 41)

    print()
    print('', '-' * 69)
    print('| {:11s} | {:11s} | {:11s} | {:11s} | {:11s} |'.format(
        'f', 'is_prime', 't min (us)', 't mean (us)', 't max (us)'))
    for f, df_sub1 in df.groupby(['f', ]):
        col = df_sub1['t_micro_seconds']
        print('|{0}|{0}|{0}|{0}|{0}|'.format('-' * 13))
        print('| {:11s} | {:11s} | {:11.2f} | {:11.2f} | {:11.2f} |'.format(
            f, 'all', col.min(), col.mean(), col.max()))
        for is_prime, df_sub2 in df_sub1.groupby(['is_prime', ]):
            col = df_sub2['t_micro_seconds']
            print('| {:11s} | {:11s} | {:11.2f} | {:11.2f} | {:11.2f} |'.format(
                f, str(is_prime), col.min(), col.mean(), col.max()))
    print('', '-' * 69)

    return df

Running the function compare_functions(n=10**5) (numbers up to 100.000) I get this output:

df.shape: (400000, 5)

 -----------------------------------------
| is_prime    | count       | percent     |
| all         |     100,000 |     100.0 % |
| False       |      90,408 |      90.4 % |
| True        |       9,592 |       9.6 % |
 -----------------------------------------

 ---------------------------------------------------------------------
| f           | is_prime    | t min (us)  | t mean (us) | t max (us)  |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_1  | all         |        0.57 |        2.50 |      154.35 |
| is_prime_1  | False       |        0.57 |        1.52 |      154.35 |
| is_prime_1  | True        |        0.89 |       11.66 |       55.54 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_2  | all         |        0.24 |        1.14 |      304.82 |
| is_prime_2  | False       |        0.24 |        0.56 |      304.82 |
| is_prime_2  | True        |        0.25 |        6.67 |       48.49 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_3  | all         |        0.20 |        0.95 |       50.99 |
| is_prime_3  | False       |        0.20 |        0.60 |       40.62 |
| is_prime_3  | True        |        0.58 |        4.22 |       50.99 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_4  | all         |        0.20 |        0.89 |       20.09 |
| is_prime_4  | False       |        0.21 |        0.53 |       14.63 |
| is_prime_4  | True        |        0.20 |        4.27 |       20.09 |
 ---------------------------------------------------------------------

Then, running the function compare_functions(n=10**6) (numbers up to 1.000.000) I get this output:

df.shape: (4000000, 5)

 -----------------------------------------
| is_prime    | count       | percent     |
| all         |   1,000,000 |     100.0 % |
| False       |     921,502 |      92.2 % |
| True        |      78,498 |       7.8 % |
 -----------------------------------------

 ---------------------------------------------------------------------
| f           | is_prime    | t min (us)  | t mean (us) | t max (us)  |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_1  | all         |        0.51 |        5.39 |     1414.87 |
| is_prime_1  | False       |        0.51 |        2.19 |      413.42 |
| is_prime_1  | True        |        0.87 |       42.98 |     1414.87 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_2  | all         |        0.24 |        2.65 |      612.69 |
| is_prime_2  | False       |        0.24 |        0.89 |      322.81 |
| is_prime_2  | True        |        0.24 |       23.27 |      612.69 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_3  | all         |        0.20 |        1.93 |       67.40 |
| is_prime_3  | False       |        0.20 |        0.82 |       61.39 |
| is_prime_3  | True        |        0.59 |       14.97 |       67.40 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_4  | all         |        0.18 |        1.88 |      332.13 |
| is_prime_4  | False       |        0.20 |        0.74 |      311.94 |
| is_prime_4  | True        |        0.18 |       15.23 |      332.13 |
 ---------------------------------------------------------------------

I used the following script to plot the results:

def plot_1(func_list=default_func_list, n):
    df_orig = compare_functions(func_list=func_list, n=n)
    df_filtered = df_orig[df_orig['t_micro_seconds'] <= 20]
    sns.lmplot(
        data=df_filtered, x='number', y='t_micro_seconds',
        col='f',
        # row='is_prime',
        markers='.',
        ci=None)

    plt.ticklabel_format(style='sci', axis='x', scilimits=(3, 3))
    plt.show()

0 讨论(0)

查看其它30个回答