Why is my computation so much faster in C# than Python

后端 未结 6 1643
日久生厌
日久生厌 2020-12-07 21:40

Below is a simple piece of process coded in C# and Python respectively (for those of you curious about the process, it\'s the solution for Problem

相关标签:
6条回答
  • 2020-12-07 21:47

    TL;DR: Long-winded post that is me trying to defend Python (my language of choice) against C#. In this example, C# performs better, but still takes more lines of code to do the same amount of work, but the final performance benefit is that C# is ~5x faster than a similar approach in Python when coded correctly. The end result is that you should use the language that suits you.

    When I run the C# example, it took about 3 seconds to complete on my machine, and gave me a result of 232,792,560. It could be optimized using the known fact that you can only have a number divisible by numbers from 1 to 20 if the number is a multiple of 20, and therefore you don't need to increment by 1, but instead 20. That single optimization made the code execute ~10x faster in a mere 353 milliseconds.

    When I run the Python example, I gave up on waiting and tried to write my own version using itertools, which didn't have much better success, and was taking about as long as your example. Then I hit upon an acceptable version of itertools, if I take into account that only multiples of my largest number could be divisible by all numbers from smallest to largest. As such, the refined Python(3.6) code is here with a decorator timing function that prints the number of seconds it took to execute:

    import time
    from itertools import count, filterfalse
    
    
    def timer(func):
        def wrapper(*args, **kwargs):
            start = time.time()
            res = func(*args, **kwargs)
            print(time.time() - start)
            return res
        return wrapper
    
    
    @timer
    def test(stop):
        return next(filterfalse(lambda x: any(x%i for i in range(2, stop)), count(stop, stop)))
    
    
    print("Test Function")
    print(test(20))
    # 11.526668787002563
    # 232792560
    

    This also reminded me of a question I recently had to answer on CodeFights for Least Common Multiple using the Greatest Common Denominator function in Python. That code is as follows:

    import time
    from fractions import gcd
    from functools import reduce
    
    
    def timer(func):
        def wrapper(*args, **kwargs):
            start = time.time()
            res = func(*args, **kwargs)
            print(time.time() - start)
            return res
        return wrapper
    
    
    @timer
    def leastCommonDenominator(denominators):
        return reduce(lambda a, b: a * b // gcd(a, b), denominators)
    
    
    print("LCM Function")
    print(leastCommonDenominator(range(1, 21)))
    # 0.001001596450805664
    # 232792560
    

    As in most programming tasks, sometimes the simplest approach isn't always the fastest. Unfortunately, it really stuck out when attempted in Python this time. That said, the beauty in Python is the simplicity of getting a performant execution, where it took 10 lines of C#, I was able to return the correct answer in (potentially) a one-line lambda expression, and 300-times faster than my simple optimization on C#. I'm no specialist in C#, but implementing the same approach here is the code I used and its result (about 5x faster than Python):

    using System;
    using System.Diagnostics;
    
    namespace ConsoleApp1
    {
        class Program
        {
            public static void Main(string[] args)
            {
                Stopwatch t0 = new Stopwatch();
                int maxNumber = 20;
    
                long start;
                t0.Start();
                start = Orig(maxNumber);
                t0.Stop();
    
                Console.WriteLine("Original | {0:d}, {1:d}", maxNumber, start);
                // Original | 20, 232792560
                Console.WriteLine("Original | time elapsed = {0}.", t0.Elapsed);
                // Original | time elapsed = 00:00:02.0585575
    
                t0.Restart();
                start = Test(maxNumber);
                t0.Stop();
    
                Console.WriteLine("Test | {0:d}, {1:d}", maxNumber, start);
                // Test | 20, 232792560
                Console.WriteLine("Test | time elapsed = {0}.", t0.Elapsed);
                // Test | time elapsed = 00:00:00.0002763
    
                Console.ReadLine();
            }
    
            public static long Orig(int maxNumber)
            {
                bool found = false;
                long start = 0;
                while (!found)
                {
                    start += maxNumber;
                    found = true;
                    for (int i=2; i < 21; i++)
                    {
                        if (start % i != 0)
                            found = false;
                    }
                }
                return start;
            }
    
            public static long Test(int maxNumber)
            {
                long result = 1;
    
                for (long i = 2; i <= maxNumber; i++)
                {
                    result = (result * i) / GCD(result, i);
                }
    
                return result;
            }
    
            public static long GCD(long a, long b)
            {
                while (b != 0)
                {
                    long c = b;
                    b = a % b;
                    a = c;
                }
    
                return a;
            }
        }
    }
    

    For most higher-level tasks, however, I usually see Python doing exceptionally well in comparison to a .NET implementation, though I cannot substantiate the claims at this time, aside from saying the Python Requests library has given me as much as a double to triple return in performance compared to a C# WebRequest written the same way. This was also true when writing Selenium processes, as I could read text elements in Python in 100 milliseconds or less, but each element retrieval took C# >1 second to return. That said, I actually prefer the C# implementation because of its object-oriented approach, where Python's Selenium implementation goes functional which gets very hard to read at times.

    0 讨论(0)
  • 2020-12-07 21:57

    Try python JIT Implementations like pypy and numba or cython if you want fast as C but sacrifice a bit of code readability.

    e.g in pypy

    # PyPy
    
    number 232792560
    
    time elapsed = 4.000000 sec.
    

    e.g in cython

    # Cython
    
    number 232792560
    
    time elapsed = 1.000000 sec.
    

    Cython Source:

    from datetime import datetime
    
    cpdef void run():
        t0 = datetime.now()
        cdef int max_number = 20
        found = False
        cdef int start = max_number
        cdef int i
        while not found:
            found = True
            i = 2
            while ((i < max_number + 1) and found):
                if (start % i) != 0:
                    found = False
                i += 1
            start += 1
    
        print("number {0:d}\n".format(start - 1))
    
        print("time elapsed = {0:f} sec.\n".format((datetime.now() - t0).seconds))
    
    0 讨论(0)
  • 2020-12-07 21:57

    Python (and all scripting languages including matlab) is not intended to be directedly used for large-scale numerical calculation. To have a compatible result as complied programs, avoid the loops at all cost and convert the formula to matrix formats (that needs a little mathematical understanding and skill), so that we can push as much as possible to the background C library provided by numpy, scipy, etc.

    Again, DO NOT write loops for numerical calculation in python, whenever a matrix equivalent possible!

    0 讨论(0)
  • 2020-12-07 22:04

    The answer is simply that Python deals with objects for everything and that it doesn't have JIT by default. So rather than being very efficient by modifying a few bytes on the stack and optimizing the hot parts of the code (i.e., the iteration) – Python chugs along with rich objects representing numbers and no on-the-fly optimizations.

    If you tried this in a variant of Python that has JIT (for example, PyPy) I guarantee you that you'll see a massive difference.

    A general tip is to avoid standard Python for very computationally expensive operations (especially if this is for a backend serving requests from multiple clients). Java, C#, JavaScript, etc. with JIT are incomparably more efficient.

    By the way, if you want to write your example in a more Pythonic manner, you could do it like this:

    from datetime import datetime
    start_time = datetime.now()
    
    max_number = 20
    x = max_number
    while True:
        i = 2
        while i <= max_number:
            if x % i: break
            i += 1
        else:
            # x was not divisible by 2...20
            break
        x += 1
    
    print('number:       %d' % x)
    print('time elapsed: %d seconds' % (datetime.now() - start_time).seconds)
    

    The above executed in 90 seconds for me. The reason it's faster relies on seemingly stupid things like x being shorter than start, that I'm not assigning variables as often, and that I'm relying on Python's own control structures rather than variable checking to jump in/out of loops.

    0 讨论(0)
  • 2020-12-07 22:06

    First of all you need to change the algorithm to solve this problem:

    #!/usr/bin/env python
    
    import sys
    from timeit import default_timer as timer
    
    pyver = sys.version_info;
    
    print(">")
    print(">  Smallest multiple of 2 ... K");
    print(">")
    print(">  Python version, interpreter version: {0}.{1}.{2}-{3}-{4}".format(
        pyver.major, pyver.minor, pyver.micro, pyver.releaselevel, pyver.serial))
    print(">")
    
    K = 20;
    
    print("  K = {0:d}".format(K))
    print("")
    
    t0 = timer()
    
    N = K
    NP1 = N + 1
    N2 = (N >> 1) + 1
    vec = range(0, NP1)
    smalestMultiple = 1
    
    for i in range(2, N2):
        divider = vec[i]
        if divider == 1:
            continue
        for j in range(i << 1, NP1, i):
            if (vec[j] % divider) == 0:
                vec[j] /= divider
    
    for i in range(2, NP1):
        if vec[i] != 1:
            smalestMultiple = smalestMultiple * vec[i]
    
    t1 = timer()
    
    print("  smalest multiple = {0:d}".format(smalestMultiple))
    print("  time elapsed = {0:f} sec.".format(t1 - t0))
    

    Otput on Linux/Fedora 28/Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz:

    >  Smallest multiple of 2 ... K
    >
    >  Python version, interpreter version: 2.7.15-final-0
    >
    >  K = 20
    >
    >  smalest multiple = 232792560
    >  time elapsed = 0.000032 sec.
    
    0 讨论(0)
  • 2020-12-07 22:08

    As some people said the best way is using JIT implementations. I know that's an old topic, but I was curious about the difference of execution time between the implementations, so I did some tests in Jupiter Notebook with Numba and Cython that was my results:

    Your code inside a function

    %%time
    
    def test():
        max_number = 20
        found = False
        start = max_number
        while not found:
            found = True
            i = 2
            while ((i < max_number + 1) and found):
                if (start % i) != 0:
                    found = False
                i += 1
            start += 1
        return start-1
    test()
    

    CPU times: user 1min 18s, sys: 462 ms, total: 1min 19s Wall time: 1min 21s

    The way how @Blixt wrote

    %%time
    
    def test():
        max_number = 20
        x = max_number
        while True:
            i = 2
            while i <= max_number:
                if x % i: break
                i += 1
            else:
                # x was not divisible by 2...20
                break
            x += 1
        
        return x
        
    test()
    

    CPU times: user 40.1 s, sys: 305 ms, total: 40.4 s Wall time: 41.9 s

    Numba

    %%time
    from numba import jit
    
    @jit(nopython=True)
    def test():
        max_number = 20
        x = max_number
        while True:
            i = 2
            while i <= max_number:
                if x % i: break
                i += 1
            else:
                # x was not divisible by 2...20
                break
            x += 1
        
        return x
        
    test()
    

    CPU times: user 4.48 s, sys: 70.5 ms, total: 4.55 s Wall time: 5.01 s

    Numba with function signature

    %%time
    from numba import jit, int32
    
    @jit(int32())
    def test():
        max_number = 20
        x = max_number
        while True:
            i = 2
            while i <= max_number:
                if x % i: break
                i += 1
            else:
                # x was not divisible by 2...20
                break
            x += 1
        
        return x
        
    test()
    

    CPU times: user 3.56 s, sys: 43.1 ms, total: 3.61 s Wall time: 3.79 s

    Cython

    %load_ext Cython
    
    %%time
    %%cython
    
    def test():
        cdef int max_number = 20
        cdef int x = max_number
        cdef int i = 2
        while True:
            i = 2
            while i <= max_number:
                if x % i: break
                i += 1
            else:
                # x was not divisible by 2...20
                break
            x += 1
        
        return x
        
    test()
    

    CPU times: user 617 ms, sys: 20.7 ms, total: 637 ms Wall time: 1.31 s

    0 讨论(0)
提交回复
热议问题