joblib | 易学教程

Joblib userwarning while trying to cache results

阅读更多关于 Joblib userwarning while trying to cache results

问题 I get the foll. userwarning when trying to cache results using joblib: from tempfile import mkdtemp cachedir = mkdtemp() from joblib import Memory memory = Memory(cachedir=cachedir, verbose=0) @memory.cache def get_nc_var3d(path_nc, var, year): """ Get value from netcdf for variable var for year :param path_nc: :param var: :param year: :return: """ try: hndl_nc = open_or_die(path_nc) val = hndl_nc.variables[var][int(year), :, :] except: val = numpy.nan logger.info('Error in getting var ' +

How to serialize a CountVectorizer with a custom tokenize function with joblib

阅读更多关于 How to serialize a CountVectorizer with a custom tokenize function with joblib

问题 I use a CountVectorizer with a custom tokenize method. When I serialize it, then unserialize it, I get the following error message : AttributeError: module '__main__' has no attribute 'tokenize' How can I "serialize" the tokenize method ? Here is a small example : import nltk from nltk.stem.snowball import FrenchStemmer stemmer = FrenchStemmer() def stem_tokens(tokens, stemmer): stemmed = [] for item in tokens: stemmed.append(stemmer.stem(item)) return stemmed def tokenize(text): tokens =

How to serialize a CountVectorizer with a custom tokenize function with joblib

阅读更多关于 How to serialize a CountVectorizer with a custom tokenize function with joblib

Similar errors in MultiProcessing. Mismatch number of arguments to function

阅读更多关于 Similar errors in MultiProcessing. Mismatch number of arguments to function

问题 I couldn't find a better way to describe the error I'm facing, but this error seems to come up everytime I try to implement Multiprocessing to a loop call. I've used both sklearn.externals.joblib as well as multiprocessing.Process but error are similar though different. Original Loop on which want to apply Multiprocessing, where one iteration in executed in single thread/process for dd in final_col_dates: idx1 = final_col_dates.tolist().index(dd) dataObj = GetPrevDataByDate(d1, a, dd, self

Printed output not displayed when using joblib in jupyter notebook

阅读更多关于 Printed output not displayed when using joblib in jupyter notebook

问题 So I am using joblib to parallelize some code and I noticed that I couldn't print things when using it inside a jupyter notebook. I tried using doing the same example in ipython and it worked perfectly. Here is a minimal (not) working example to write in a jupyter notebook cell from joblib import Parallel, delayed Parallel(n_jobs=8)(delayed(print)(i) for i in range(10)) So I am getting the output as [None, None, None, None, None, None, None, None, None, None] but nothing is printed. Actually,

Multiprocessing backed parallel loops cannot be nested below threads

阅读更多关于 Multiprocessing backed parallel loops cannot be nested below threads

问题 What is the reason of such issue in joblib? 'Multiprocessing backed parallel loops cannot be nested below threads, setting n_jobs=1' What should I do to avoid such issue? Actually I need to implement XMLRPC server which run heavy computation in background thread and report current progress through polling from UI client. It uses scikit-learn which are based on joblib. P.S.: I've simply changed name of the thread to "MainThread" to avoid such warning and everything looks working good (run in

Joblib simple example parallel example slower than simple

阅读更多关于 Joblib simple example parallel example slower than simple

问题 from math import sqrt from joblib import Parallel, delayed import time if __name__ == '__main__': st= time.time() #[sqrt(i ** 2) for i in range(100000)] #this part in non parellel Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(100000)) print time.time()-st now the non parelle part runs in 0.4 sec while parallel part runs for 18 sec .. I am confused why would this happen 回答1: Parallel processes (which joblib creates) require copying data. Imagine it this way: you have two people who

Printing a Parellel Function Outputs in True Order w/Python

阅读更多关于 Printing a Parellel Function Outputs in True Order w/Python

问题 Looking to print everything in order, for a Python parallelized script. Note the c3 is printed prior to the b2 -- out of order. Any way to make the below function with a wait feature? If you rerun, sometimes the print order is correct for shorter batches. However, looking for a reproducible solution to this issue. from joblib import Parallel, delayed, parallel_backend import multiprocessing testFrame = [['a',1], ['b', 2], ['c', 3]] def testPrint(letr, numbr): print(letr + str(numbr)) return

Scoring returning a numpy.core.memmap instead of a numpy.Number in grid search

阅读更多关于 Scoring returning a numpy.core.memmap instead of a numpy.Number in grid search

问题 We are able (only within the context of our application atm) to reproduce on Ubuntu 15.04 and OS X with scikit 0.17 the following problem when using GridSearchCV with a LogisticRegression on larger data sets. ........................................................................... /Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/pipeline.py in fit(self=Pipeline(steps=[('cpencoder', <cpml.whitebox.Lin...s', refit=True, scoring=u'roc_auc', verbose=1))]), X= Unnamed:

Scoring returning a numpy.core.memmap instead of a numpy.Number in grid search

阅读更多关于 Scoring returning a numpy.core.memmap instead of a numpy.Number in grid search