How to create the most compact mapping n → isprime(n) up to a limit N?

后端 未结 30 3366
遇见更好的自我
遇见更好的自我 2020-11-22 02:11

Naturally, for bool isprime(number) there would be a data structure I could query.
I define the best algorithm, to be the algorithm that pr

30条回答
  •  庸人自扰
    2020-11-22 02:45

    I compared the efficiency of the most popular suggestions to determine if a number is prime. I used python 3.6 on ubuntu 17.10; I tested with numbers up to 100.000 (you can test with bigger numbers using my code below).

    This first plot compares the functions (which are explained further down in my answer), showing that the last functions do not grow as fast as the first one when increasing the numbers.

    And in the second plot we can see that in case of prime numbers the time grows steadily, but non-prime numbers do not grow so fast in time (because most of them can be eliminated early on).

    Here are the functions I used:

    1. this answer and this answer suggested a construct using all():

      def is_prime_1(n):
          return n > 1 and all(n % i for i in range(2, int(math.sqrt(n)) + 1))
      
    2. This answer used some kind of while loop:

      def is_prime_2(n):
          if n <= 1:
              return False
          if n == 2:
              return True
          if n == 3:
              return True
          if n % 2 == 0:
              return False
          if n % 3 == 0:
              return False
      
          i = 5
          w = 2
          while i * i <= n:
              if n % i == 0:
                  return False
              i += w
              w = 6 - w
      
          return True
      
    3. This answer included a version with a for loop:

      def is_prime_3(n):
          if n <= 1:
              return False
      
          if n % 2 == 0 and n > 2:
              return False
      
          for i in range(3, int(math.sqrt(n)) + 1, 2):
              if n % i == 0:
                  return False
      
          return True
      
    4. And I mixed a few ideas from the other answers into a new one:

      def is_prime_4(n):
          if n <= 1:          # negative numbers, 0 or 1
              return False
          if n <= 3:          # 2 and 3
              return True
          if n % 2 == 0 or n % 3 == 0:
              return False
      
          for i in range(5, int(math.sqrt(n)) + 1, 2):
              if n % i == 0:
                  return False
      
          return True
      

    Here is my script to compare the variants:

    import math
    import pandas as pd
    import seaborn as sns
    import time
    from matplotlib import pyplot as plt
    
    
    def is_prime_1(n):
        ...
    def is_prime_2(n):
        ...
    def is_prime_3(n):
        ...
    def is_prime_4(n):
        ...
    
    default_func_list = (is_prime_1, is_prime_2, is_prime_3, is_prime_4)
    
    def assert_equal_results(func_list=default_func_list, n):
        for i in range(-2, n):
            r_list = [f(i) for f in func_list]
            if not all(r == r_list[0] for r in r_list):
                print(i, r_list)
                raise ValueError
        print('all functions return the same results for integers up to {}'.format(n))
    
    def compare_functions(func_list=default_func_list, n):
        result_list = []
        n_measurements = 3
    
        for f in func_list:
            for i in range(1, n + 1):
                ret_list = []
                t_sum = 0
                for _ in range(n_measurements):
                    t_start = time.perf_counter()
                    is_prime = f(i)
                    t_end = time.perf_counter()
    
                    ret_list.append(is_prime)
                    t_sum += (t_end - t_start)
    
                is_prime = ret_list[0]
                assert all(ret == is_prime for ret in ret_list)
                result_list.append((f.__name__, i, is_prime, t_sum / n_measurements))
    
        df = pd.DataFrame(
            data=result_list,
            columns=['f', 'number', 'is_prime', 't_seconds'])
        df['t_micro_seconds'] = df['t_seconds'].map(lambda x: round(x * 10**6, 2))
        print('df.shape:', df.shape)
    
        print()
        print('', '-' * 41)
        print('| {:11s} | {:11s} | {:11s} |'.format(
            'is_prime', 'count', 'percent'))
        df_sub1 = df[df['f'] == 'is_prime_1']
        print('| {:11s} | {:11,d} | {:9.1f} % |'.format(
            'all', df_sub1.shape[0], 100))
        for (is_prime, count) in df_sub1['is_prime'].value_counts().iteritems():
            print('| {:11s} | {:11,d} | {:9.1f} % |'.format(
                str(is_prime), count, count * 100 / df_sub1.shape[0]))
        print('', '-' * 41)
    
        print()
        print('', '-' * 69)
        print('| {:11s} | {:11s} | {:11s} | {:11s} | {:11s} |'.format(
            'f', 'is_prime', 't min (us)', 't mean (us)', 't max (us)'))
        for f, df_sub1 in df.groupby(['f', ]):
            col = df_sub1['t_micro_seconds']
            print('|{0}|{0}|{0}|{0}|{0}|'.format('-' * 13))
            print('| {:11s} | {:11s} | {:11.2f} | {:11.2f} | {:11.2f} |'.format(
                f, 'all', col.min(), col.mean(), col.max()))
            for is_prime, df_sub2 in df_sub1.groupby(['is_prime', ]):
                col = df_sub2['t_micro_seconds']
                print('| {:11s} | {:11s} | {:11.2f} | {:11.2f} | {:11.2f} |'.format(
                    f, str(is_prime), col.min(), col.mean(), col.max()))
        print('', '-' * 69)
    
        return df
    

    Running the function compare_functions(n=10**5) (numbers up to 100.000) I get this output:

    df.shape: (400000, 5)
    
     -----------------------------------------
    | is_prime    | count       | percent     |
    | all         |     100,000 |     100.0 % |
    | False       |      90,408 |      90.4 % |
    | True        |       9,592 |       9.6 % |
     -----------------------------------------
    
     ---------------------------------------------------------------------
    | f           | is_prime    | t min (us)  | t mean (us) | t max (us)  |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_1  | all         |        0.57 |        2.50 |      154.35 |
    | is_prime_1  | False       |        0.57 |        1.52 |      154.35 |
    | is_prime_1  | True        |        0.89 |       11.66 |       55.54 |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_2  | all         |        0.24 |        1.14 |      304.82 |
    | is_prime_2  | False       |        0.24 |        0.56 |      304.82 |
    | is_prime_2  | True        |        0.25 |        6.67 |       48.49 |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_3  | all         |        0.20 |        0.95 |       50.99 |
    | is_prime_3  | False       |        0.20 |        0.60 |       40.62 |
    | is_prime_3  | True        |        0.58 |        4.22 |       50.99 |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_4  | all         |        0.20 |        0.89 |       20.09 |
    | is_prime_4  | False       |        0.21 |        0.53 |       14.63 |
    | is_prime_4  | True        |        0.20 |        4.27 |       20.09 |
     ---------------------------------------------------------------------
    

    Then, running the function compare_functions(n=10**6) (numbers up to 1.000.000) I get this output:

    df.shape: (4000000, 5)
    
     -----------------------------------------
    | is_prime    | count       | percent     |
    | all         |   1,000,000 |     100.0 % |
    | False       |     921,502 |      92.2 % |
    | True        |      78,498 |       7.8 % |
     -----------------------------------------
    
     ---------------------------------------------------------------------
    | f           | is_prime    | t min (us)  | t mean (us) | t max (us)  |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_1  | all         |        0.51 |        5.39 |     1414.87 |
    | is_prime_1  | False       |        0.51 |        2.19 |      413.42 |
    | is_prime_1  | True        |        0.87 |       42.98 |     1414.87 |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_2  | all         |        0.24 |        2.65 |      612.69 |
    | is_prime_2  | False       |        0.24 |        0.89 |      322.81 |
    | is_prime_2  | True        |        0.24 |       23.27 |      612.69 |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_3  | all         |        0.20 |        1.93 |       67.40 |
    | is_prime_3  | False       |        0.20 |        0.82 |       61.39 |
    | is_prime_3  | True        |        0.59 |       14.97 |       67.40 |
    |-------------|-------------|-------------|-------------|-------------|
    | is_prime_4  | all         |        0.18 |        1.88 |      332.13 |
    | is_prime_4  | False       |        0.20 |        0.74 |      311.94 |
    | is_prime_4  | True        |        0.18 |       15.23 |      332.13 |
     ---------------------------------------------------------------------
    

    I used the following script to plot the results:

    def plot_1(func_list=default_func_list, n):
        df_orig = compare_functions(func_list=func_list, n=n)
        df_filtered = df_orig[df_orig['t_micro_seconds'] <= 20]
        sns.lmplot(
            data=df_filtered, x='number', y='t_micro_seconds',
            col='f',
            # row='is_prime',
            markers='.',
            ci=None)
    
        plt.ticklabel_format(style='sci', axis='x', scilimits=(3, 3))
        plt.show()
    

提交回复
热议问题