问题
How optimally search parameter space using Dask? (no cross validation)
Here is the code (no DASK here):
def build(ntries,param,niter,func,score,train,test):
    res=[]
    for i in range(ntries):
        cparam=param.rvs(size=niter,random_state=i)
        res.append( func(cparam, train, test, score) )
    return res
def score(test,correct):
    return np.linalg.norm(test-correct)
def compute_optimal(res):
    from operator import itemgetter
    _sorted=sorted(res,None,itemgetter(1))
    return _sorted
def func(c,train,test,score):
    dt=1.0/len(c)
    for cc in c:
        train=train - cc*dt
    return (c,score(train,test))
Here is how I use it:
from dask import delayed
from distributed import LocalCluster, Client
cluster=LocalCluster(n_workers=4, threads_per_worker=1)
cli=Client(cluster)
from scipy.stats import uniform
import numpy as np
niter=500
loc=1.0e-09
scale=1.0
ntries=1000
sched=uniform(loc=loc,scale=scale)
train=np.arange(1000)+0.5
test=np.arange(1000)
# HERE IS THE DASK
graph=build(ntries,sched,niter,delayed(func),score,train,test)
# THE QUESTION SECTION
# I do these steps to bring back all the values so that I could search for the score-wise optimal pair: (parameter, score)
res=[cli.compute(g) for g in graph]
results=[r.result() for r in res]
# Actual search for the optimal pair
optimal=compute_optimal(results)
best,worst=optimal[0],optimal[-1]
The questions are:
- Am I using Dask correctly here?
- Am I fetching data back to the client correctly? Are there more efficient ways to do this?
- Is there any way to do search for the optimal pair on workers?
P.S. Recently I posted related question but with different issue (thread.lock during custom parameter search class using Dask distributed). I've solved it and will post an answer shortly and will close that issue.
来源:https://stackoverflow.com/questions/44991053/parameter-search-using-dask