I\'m creating a program that will create a file and save it to the directory with the filename sample.xml. Once the file is saved when i try to run the program again it over
Sequentially checking each file name to find the next available one works fine with small numbers of files, but quickly becomes slower as the number of files increases.
Here is a version that finds the next available file name in log(n) time:
import os
def next_path(path_pattern):
"""
Finds the next free path in an sequentially named list of files
e.g. path_pattern = 'file-%s.txt':
file-1.txt
file-2.txt
file-3.txt
Runs in log(n) time where n is the number of existing files in sequence
"""
i = 1
# First do an exponential search
while os.path.exists(path_pattern % i):
i = i * 2
# Result lies somewhere in the interval (i/2..i]
# We call this interval (a..b] and narrow it down until a + 1 = b
a, b = (i // 2, i)
while a + 1 < b:
c = (a + b) // 2 # interval midpoint
a, b = (c, b) if os.path.exists(path_pattern % c) else (a, c)
return path_pattern % b
To measure the speed improvement I wrote a small test function that creates 10,000 files:
for i in range(1,10000):
with open(next_path('file-%s.foo'), 'w'):
pass
And implemented the naive approach:
def next_path_naive(path_pattern):
"""
Naive (slow) version of next_path
"""
i = 1
while os.path.exists(path_pattern % i):
i += 1
return path_pattern % i
And here are the results:
Fast version:
real 0m2.132s
user 0m0.773s
sys 0m1.312s
Naive version:
real 2m36.480s
user 1m12.671s
sys 1m22.425s
Finally, note that either approach is susceptible to race conditions if multiple actors are trying to create files in the sequence at the same time.