问题
I write below code to test cache feature of numba
import numba
import numpy as np
import time
@numba.njit(cache=True)
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
a=np.random.random((1000,100))
print(time.time())
sum2d(a)
print(time.time())
print(time.time())
sum2d(a)
print(time.time())
Though, there are some cache files generated in pycache folder, the timing is always the same like
1576855294.8787484
1576855295.5378428
1576855295.5378428
1576855295.5388253
no matter how many times I run this script, which means that first run of sum2d takes much more time to compile. Then what is usage of cache file in pycache folder?
回答1:
The following script illustrates the point of cache=True. It first calls a non-cached dummy function that absorbs the time it takes to initialize numba. Then it proceeds with calling twice the sum2d function with no cache and twice the sum2d function with cache.
import numba
import numpy as np
import time
@numba.njit
def dummy():
return None
@numba.njit
def sum2d_nocache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
@numba.njit(cache=True)
def sum2d_cache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
start = time.time()
dummy()
end = time.time()
print(f'Dummy timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 2nd timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 2nd timing {end - start}')
Output after 1st run:
Dummy timing 0.10361385345458984
No cache 1st timing 0.08893513679504395
No cache 2nd timing 0.00020122528076171875
Cache 1st timing 0.08929300308227539
Cache 2nd timing 0.00015544891357421875
Output after 2nd run:
Dummy timing 0.08973526954650879
No cache 1st timing 0.0809786319732666
No cache 2nd timing 0.0001163482666015625
Cache 1st timing 0.0016787052154541016
Cache 2nd timing 0.0001163482666015625
What does this output tells us?
- The time to initialize
numbais not negligible. - During the first run, the first call of the cache and non-cache version take longer due to compilation time.
- In this example, the creation of the cache file doesn't make much of a difference.
- In the second run, the first call to the cache function is much faster (this is what
cache=Trueis for) - The subsequent calls to the cache and non-cache functions take approximately the same time.
The point of using cache=True is to avoid repeating the compile time of large and complex functions at each run of a script. In this example the function is simple and the time saving is limited but for a script with a number of more complex functions, using cache can significantly reduce the run-time.
来源:https://stackoverflow.com/questions/59427775/numba-cache-true-has-no-effect