问题
We got this task from our professor. Prerequisites are:
- Use Python 3 and use only build-in functions (no numpy).
- Main task: Find and store the result within 5 sec.
- Minor task, just nice to have: Find not only the value for base b=3, but also for the bases b=3**k (with k = 2,3,4).
Compared to our 1st straight-forward solution, we achieved an improvement by factor 96 (almost 100 times faster), but still it doesn't fulfill the 5 sec limit (currently, we are at 25 sec on an i7 laptop). [Our prof also has no solution in pure Python, so it's a bit of a research task.]
The complete code (including test calls) is here: Overall, it shows an improvement from originally 2400 sec (= 40 min) to 25 sec. However, we need another performance improvement of factor 5. Does someone have ideas and can help?
# -*- coding: utf-8 -*-
#
# Convert a long random sequence of base-10 digits to integers base 3**k with k=1,2,3,4
#
# Task for phdgroupA: length of sequence is 1.5*(10**6)
# time < 5 sec
# Use Python 3 (standard libraries only, no numpy) !
#
# Testcase with a very small sequence, made purely of the digit 7:
# (see sagemath or www.math.com/tables/general/base_conv.htm)
# numlen = 12 --> 777777777777_base10
# = 2202100120200002212221010_base3
# = 2670520085833_base9
# = 2k9fi2np3_base27 ("digits": 0123456789ab...pq)
# [2, 20, 9, 15, 18, 2, 23, 25, 3]
# = 2[61]5[18]8[53][30]_base81
# [2, 61, 5, 18, 8, 53, 30]
#
# Convert decimal number n to a sequence of list elements with integer values in the range 0 to base-1.
# With divmod, it's ca. 1/3 faster than using n%b and then n//=b.
def numberToBase(n, b):
digits = []
while n:
n, rem = divmod(n, b)
digits.append(rem)
return digits[::-1]
# Step 0: Create string of nlen digits
def step0(nlen):
rd = 7 # which digit to repeat
string_val = "".join(str(rd) for i in range(nlen))
return string_val # end of step0()
# Step 1: Convert string to int (the string contains only decimal digits)
def step1(string_val, option_chunk=True):
if option_chunk == True:
string_val_len = len(string_val)
Chunk_len = 90000
Read_len = 0
int_valChunk = 0
int_valLocal = 0
ii = 0
while Read_len < string_val_len:
string_val_ChunkRead = string_val[ii*Chunk_len:(ii+1)*Chunk_len]
Chunk_lenRead = len(string_val_ChunkRead)
int_valChunk = int(string_val_ChunkRead)
ii += 1
int_valLocal = int_valLocal * 10**Chunk_lenRead + int_valChunk
Read_len += Chunk_lenRead
int_val = int_valLocal
else:
int_val = int(string_val)
return int_val # end of step1()
# Step 2: Convert given integer to another base
def step2(n, b, convsteps):
nList = []
if convsteps == 3: # Here the conversion is done in 3 steps
expos = 10000, 300
base_a = b ** expos[0]
base_b = b ** expos[1]
nList1 = numberToBase(n, base_a) # That's the time killer in this part
nList2 = [numberToBase(ll, base_b) for ll in nList1]
nList3 = [numberToBase(mm, b) for ll in nList2 for mm in ll]
nList = [mm for ll in nList3 for mm in ll]
else: # Do conversion in one bulk
nList = numberToBase(n, b)
return nList # end of step2()
if __name__ == '__main__':
# Calculate the string of digits
numlen = 1500000 # number of digits = length of sequence
string_value = step0(numlen)
# Calculate the integer value of the string_value
int_value = step1(string_value, option_chunk=True)
# Convert int_value to list of numbers of the given bases
convsteps = 3 # value of '3' makes step2() 50-60 times faster than value '1'
b = 3
numList = step2(int_value, b, convsteps)
print('3**1: numList begin:', numList[:10]) # Expect: [2, 0, 1, 0, 0, 1, 1, 0, 2, 1]
Ideas may be, the chunk in step 1 could have another size? Or the two big bases for the intermediate conversions could be better balanced? Or the conversion from a string of decimal digits to a list of base 3 could be made more directly?
Description: The algorithm in the Python code above works in 3 steps:
- step 0: Get data. Here we create -- for test purposes -- a sequence of decimal digits of a length of 1.5 million digits. This value is normally a value we will get as a random value from file. The sequence is then stored as a string.
- step 1: Convert that string to an integer (default is base 10).
- step 2: Convert that integer to an integer of base b=3.
These three changes caused the most improvements (compared to the initial straight-forward solution):
The helper function numberToBase(n, b) which is used in step 2, converts the integer n to an integer of base b. The result is a list of decimal integers each of base b. Reading the list as a sequence is the resulting number in base b. The improvement was achieved by using the build-in function 'divmod' instead of the two commands n%b and n//=b within the while loop. This brought a performance boost of factor 2.
Function step2(n, b, convsteps) converts the given integer n into an integer of base b (with b=3). Initially, we called the helper function numberToBase(n, b) once. Then, we introduced intermediate steps in step2() -- so n wasn't migrated to the final base in one step, but in 3 steps. The intermediate bases are much bigger than the final basis b. These intermediate base conversions made step 2 much quicker: 60 times.
Function step1() was made 4 times faster by reading the string in chunks and by doing the conversion for each junk separately.
Any idea is welcome. Please test your ideas with time() to also give a quantitative statement about its advantage. Other answers we checked here, didn't not use such a long sequence of decimal digits (in the string) or didn't focus on the performance of the base conversion.
回答1:
ok I think this is the solution
base3to9={
"00":"0",
"01":"1",
"02":"2",
"10":"3",
"11":"4",
"12":"5",
"20":"6",
"21":"7",
"22":"8",
}
def convert_base3_to_base9(s):
s = '0'*(len(s)%2) + s # ensure that the string is the right length
return "".join(base3to9[s[i:i+2]] for i in range(0,len(s),2))
print(convert_base3_to_base9("12012120121010"))
# 5176533
then you can extrapolate it out
base3to27 = {
"000":"0",
"001":"1",
...
"222":"Q"
}
def convert_base3_to_base27(s):
s = '0'*(len(s)%3) + s # ensure that the string is the right length
return "".join(base3to27[s[i:i+3]] for i in range(0,len(s),3))
basically no math to do at all ... just O(1) dict lookups ... should be really quite fast
来源:https://stackoverflow.com/questions/53530091/how-to-calculate-as-quick-as-possible-the-base-3-value-of-an-integer-which-is-gi