问题
Im running several machine learning algorithms with sklearn in a for loop and want to see how long each of them takes. The problem is I also need to return a value and DONT want to have to run it more than once because each algorithm takes so long. Is there a way to capture the return value 'clf' using python's timeit module or a similar one with a function like this...
def RandomForest(train_input, train_output):
clf = ensemble.RandomForestClassifier(n_estimators=10)
clf.fit(train_input, train_output)
return clf
when I call the function like this
t = Timer(lambda : RandomForest(trainX,trainy))
print t.timeit(number=1)
P.S. I also dont want to set a global 'clf' because I might want to do multithreading or multiprocessing later.
回答1:
The problem boils down to timeit._template_func not returning the function's return value:
def _template_func(setup, func):
"""Create a timer function. Used if the "statement" is a callable."""
def inner(_it, _timer, _func=func):
setup()
_t0 = _timer()
for _i in _it:
_func()
_t1 = _timer()
return _t1 - _t0
return inner
We can bend timeit
to our will with a bit of monkey-patching:
import timeit
import time
def _template_func(setup, func):
"""Create a timer function. Used if the "statement" is a callable."""
def inner(_it, _timer, _func=func):
setup()
_t0 = _timer()
for _i in _it:
retval = _func()
_t1 = _timer()
return _t1 - _t0, retval
return inner
timeit._template_func = _template_func
def foo():
time.sleep(1)
return 42
t = timeit.Timer(foo)
print(t.timeit(number=1))
returns
(1.0010340213775635, 42)
The first value is the timeit result (in seconds), the second value is the function's return value.
Note that the monkey-patch above only affects the behavior of timeit
when a callable is passed timeit.Timer
. If you pass a string statement, then you'd have to (similarly) monkey-patch the timeit.template
string.
回答2:
For Python 3.5 you can override the value of timeit.template
timeit.template = """
def inner(_it, _timer{init}):
{setup}
_t0 = _timer()
for _i in _it:
retval = {stmt}
_t1 = _timer()
return _t1 - _t0, retval
"""
unutbu's answer works for python 3.4 but not 3.5 as the _template_func function appears to have been removed in 3.5
回答3:
Funnily enough, I'm also doing machine-learning, and have a similar requirement ;-)
I solved it as follows, by writing a function, that:
- runs your function
- prints the running time, along with the name of your function
- returns the results
Let's say you want to time:
clf = RandomForest(train_input, train_output)
Then do:
clf = time_fn( RandomForest, train_input, train_output )
Stdout will show something like:
mymodule.RandomForest: 0.421609s
Code for time_fn:
import time
def time_fn( fn, *args, **kwargs ):
start = time.clock()
results = fn( *args, **kwargs )
end = time.clock()
fn_name = fn.__module__ + "." + fn.__name__
print fn_name + ": " + str(end-start) + "s"
return results
回答4:
If I understand it well, after python 3.5 you can define globals at each Timer instance without having to define them in your block of code. I am not sure if it would have the same issues with parallelization.
My approach would be something like:
clf = ensemble.RandomForestClassifier(n_estimators=10)
myGlobals = globals()
myGlobals.update({'clf'=clf})
t = Timer(stmt='clf.fit(trainX,trainy)', globals=myGlobals)
print(t.timeit(number=1))
print(clf)
回答5:
An approach I'm using it is to "append" the running time to the results of the timed function. So, I write a very simple decorator using the "time" module:
def timed(func):
def func_wrapper(*args, **kwargs):
import time
s = time.clock()
result = func(*args, **kwargs)
e = time.clock()
return result + (e-s,)
return func_wrapper
And then I use the decorator for the function I want to time.
回答6:
For Python 3.X I use this approach:
# Redefining default Timer template to make 'timeit' return
# test's execution timing and the function return value
new_template = """
def inner(_it, _timer{init}):
{setup}
_t0 = _timer()
for _i in _it:
ret_val = {stmt}
_t1 = _timer()
return _t1 - _t0, ret_val
"""
timeit.template = new_template
来源:https://stackoverflow.com/questions/24812253/how-can-i-capture-return-value-with-python-timeit-module