How to calculate as quick as possible the base 3 value of an integer which is given as a huge sequence of decimal digits (more than one million)?

青春壹個敷衍的年華 提交于 2019-12-24 10:30:03

问题


We got this task from our professor. Prerequisites are:

  • Use Python 3 and use only build-in functions (no numpy).
  • Main task: Find and store the result within 5 sec.
  • Minor task, just nice to have: Find not only the value for base b=3, but also for the bases b=3**k (with k = 2,3,4).

Compared to our 1st straight-forward solution, we achieved an improvement by factor 96 (almost 100 times faster), but still it doesn't fulfill the 5 sec limit (currently, we are at 25 sec on an i7 laptop). [Our prof also has no solution in pure Python, so it's a bit of a research task.]

The complete code (including test calls) is here: Overall, it shows an improvement from originally 2400 sec (= 40 min) to 25 sec. However, we need another performance improvement of factor 5. Does someone have ideas and can help?

# -*- coding: utf-8 -*-
#
# Convert a long random sequence of base-10 digits to integers base 3**k with k=1,2,3,4
# 
# Task for phdgroupA: length of sequence is 1.5*(10**6)
#                     time < 5 sec
#                     Use Python 3 (standard libraries only, no numpy) !
#
# Testcase with a very small sequence, made purely of the digit 7:
# (see sagemath or www.math.com/tables/general/base_conv.htm)
# numlen = 12  -->  777777777777_base10
#                =  2202100120200002212221010_base3
#                =  2670520085833_base9
#                =  2k9fi2np3_base27   ("digits": 0123456789ab...pq)
#                   [2, 20, 9, 15, 18, 2, 23, 25, 3]
#                =  2[61]5[18]8[53][30]_base81
#                   [2, 61, 5, 18, 8, 53, 30]
# 


# Convert decimal number n to a sequence of list elements with integer values in the range 0 to base-1.
# With divmod, it's ca. 1/3 faster than using n%b and then n//=b.
def numberToBase(n, b):
    digits = []
    while n:
        n, rem = divmod(n, b)
        digits.append(rem)
    return digits[::-1]


# Step 0: Create string of nlen digits
def step0(nlen):
    rd = 7  # which digit to repeat
    string_val = "".join(str(rd) for i in range(nlen))
    return string_val  # end of step0()


# Step 1: Convert string to int (the string contains only decimal digits)
def step1(string_val, option_chunk=True):
    if option_chunk == True:
        string_val_len = len(string_val)
        Chunk_len = 90000
        Read_len = 0
        int_valChunk = 0
        int_valLocal = 0
        ii = 0
        while Read_len < string_val_len:
            string_val_ChunkRead = string_val[ii*Chunk_len:(ii+1)*Chunk_len]
            Chunk_lenRead = len(string_val_ChunkRead)
            int_valChunk = int(string_val_ChunkRead)
            ii += 1
            int_valLocal = int_valLocal * 10**Chunk_lenRead + int_valChunk
            Read_len += Chunk_lenRead
        int_val = int_valLocal
    else:
        int_val = int(string_val)
    return int_val  # end of step1()


# Step 2: Convert given integer to another base
def step2(n, b, convsteps):
    nList = []
    if convsteps == 3:  # Here the conversion is done in 3 steps
        expos = 10000, 300
        base_a = b ** expos[0]
        base_b = b ** expos[1]
        nList1 = numberToBase(n, base_a)  # That's the time killer in this part
        nList2 = [numberToBase(ll, base_b) for ll in nList1]
        nList3 = [numberToBase(mm, b) for ll in nList2 for mm in ll]
        nList = [mm for ll in nList3 for mm in ll]
    else: # Do conversion in one bulk
        nList = numberToBase(n, b)
    return nList  # end of step2()



if __name__ == '__main__':

    # Calculate the string of digits
    numlen = 1500000  # number of digits = length of sequence
    string_value = step0(numlen)

    # Calculate the integer value of the string_value
    int_value = step1(string_value, option_chunk=True)

    # Convert int_value to list of numbers of the given bases
    convsteps = 3  # value of '3' makes step2() 50-60 times faster than value '1'

    b = 3
    numList = step2(int_value, b, convsteps)
    print('3**1: numList begin:', numList[:10])  # Expect: [2, 0, 1, 0, 0, 1, 1, 0, 2, 1]

Ideas may be, the chunk in step 1 could have another size? Or the two big bases for the intermediate conversions could be better balanced? Or the conversion from a string of decimal digits to a list of base 3 could be made more directly?

Description: The algorithm in the Python code above works in 3 steps:

  • step 0: Get data. Here we create -- for test purposes -- a sequence of decimal digits of a length of 1.5 million digits. This value is normally a value we will get as a random value from file. The sequence is then stored as a string.
  • step 1: Convert that string to an integer (default is base 10).
  • step 2: Convert that integer to an integer of base b=3.

These three changes caused the most improvements (compared to the initial straight-forward solution):

  1. The helper function numberToBase(n, b) which is used in step 2, converts the integer n to an integer of base b. The result is a list of decimal integers each of base b. Reading the list as a sequence is the resulting number in base b. The improvement was achieved by using the build-in function 'divmod' instead of the two commands n%b and n//=b within the while loop. This brought a performance boost of factor 2.

  2. Function step2(n, b, convsteps) converts the given integer n into an integer of base b (with b=3). Initially, we called the helper function numberToBase(n, b) once. Then, we introduced intermediate steps in step2() -- so n wasn't migrated to the final base in one step, but in 3 steps. The intermediate bases are much bigger than the final basis b. These intermediate base conversions made step 2 much quicker: 60 times.

  3. Function step1() was made 4 times faster by reading the string in chunks and by doing the conversion for each junk separately.

Any idea is welcome. Please test your ideas with time() to also give a quantitative statement about its advantage. Other answers we checked here, didn't not use such a long sequence of decimal digits (in the string) or didn't focus on the performance of the base conversion.


回答1:


ok I think this is the solution

base3to9={
   "00":"0",
   "01":"1",
   "02":"2",
   "10":"3",
   "11":"4",
   "12":"5",
   "20":"6",
   "21":"7",
   "22":"8",   
}
def convert_base3_to_base9(s):
    s = '0'*(len(s)%2) + s # ensure that the string is the right length
    return "".join(base3to9[s[i:i+2]] for i in range(0,len(s),2))

print(convert_base3_to_base9("12012120121010"))
# 5176533

then you can extrapolate it out

base3to27 = {
    "000":"0",
    "001":"1",
    ...
    "222":"Q"
}
def convert_base3_to_base27(s):
    s = '0'*(len(s)%3) + s # ensure that the string is the right length
    return "".join(base3to27[s[i:i+3]] for i in range(0,len(s),3))

basically no math to do at all ... just O(1) dict lookups ... should be really quite fast



来源:https://stackoverflow.com/questions/53530091/how-to-calculate-as-quick-as-possible-the-base-3-value-of-an-integer-which-is-gi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!