joblib

Efficient parallelization of operations on two dimensional array operations in python

耗尽温柔 提交于 2021-02-18 17:49:49
问题 I'm trying to parallelize operations on two dimensional array using joblib library in python. Here is the code I have from joblib import Parallel, delayed import multiprocessing import numpy as np # The code below just aggregates the base_array to form a new two dimensional array base_array = np.ones((2**12, 2**12), dtype=np.uint8) def compute_average(i, j): return np.uint8(np.mean(base_array[i*4: (i+1)*4, j*4: (j+1)*4])) num_cores = multiprocessing.cpu_count() new_array = np.array(Parallel(n

Efficient pairwise DTW calculation using numpy or cython

好久不见. 提交于 2021-02-18 10:11:33
问题 I am trying to calculate the pairwise distances between multiple time-series contained in a numpy array. Please see the code below print(type(sales)) print(sales.shape) <class 'numpy.ndarray'> (687, 157) So, sales contains 687 time series of length 157. Using pdist to calculate the DTW distances between the time series. import fastdtw import scipy.spatial.distance as sd def my_fastdtw(sales1, sales2): return fastdtw.fastdtw(sales1,sales2)[0] distance_matrix = sd.pdist(sales, my_fastdtw) --

Efficient pairwise DTW calculation using numpy or cython

断了今生、忘了曾经 提交于 2021-02-18 10:10:54
问题 I am trying to calculate the pairwise distances between multiple time-series contained in a numpy array. Please see the code below print(type(sales)) print(sales.shape) <class 'numpy.ndarray'> (687, 157) So, sales contains 687 time series of length 157. Using pdist to calculate the DTW distances between the time series. import fastdtw import scipy.spatial.distance as sd def my_fastdtw(sales1, sales2): return fastdtw.fastdtw(sales1,sales2)[0] distance_matrix = sd.pdist(sales, my_fastdtw) --

Python joblib performance

∥☆過路亽.° 提交于 2021-02-10 16:03:35
问题 I need to run an embarrassingly parallel for loop. After a quick search, I found package joblib for python. I did a simple test as posted on the package's website. Here is the test from math import sqrt from joblib import Parallel, delayed import multiprocessing %timeit [sqrt(i ** 2) for i in range(10)] result: 3.89 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) num_cores = multiprocessing.cpu_count() %timeit Parallel(n_jobs=num_cores)(delayed(sqrt)(i ** 2) for i in

Python joblib performance

寵の児 提交于 2021-02-10 16:01:40
问题 I need to run an embarrassingly parallel for loop. After a quick search, I found package joblib for python. I did a simple test as posted on the package's website. Here is the test from math import sqrt from joblib import Parallel, delayed import multiprocessing %timeit [sqrt(i ** 2) for i in range(10)] result: 3.89 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) num_cores = multiprocessing.cpu_count() %timeit Parallel(n_jobs=num_cores)(delayed(sqrt)(i ** 2) for i in

Joblib Parallel + Cython hanging forever

只愿长相守 提交于 2021-02-10 15:44:12
问题 I have a very weird problem while creating a Python extension with Cython that uses joblib.Parallel . The following code works as expected: from joblib import Parallel, delayed from math import sqrt print(Parallel(n_jobs=4)(delayed(sqrt)(x) for x in range(4))) The following code hangs forever: from joblib import Parallel, delayed def mult(x): return x*3 print(Parallel(n_jobs=4)(delayed(mult)(x) for x in range(4))) I have no clues why. I use the following setup.py : from distutils.core import

Joblib Parallel + Cython hanging forever

好久不见. 提交于 2021-02-10 15:41:32
问题 I have a very weird problem while creating a Python extension with Cython that uses joblib.Parallel . The following code works as expected: from joblib import Parallel, delayed from math import sqrt print(Parallel(n_jobs=4)(delayed(sqrt)(x) for x in range(4))) The following code hangs forever: from joblib import Parallel, delayed def mult(x): return x*3 print(Parallel(n_jobs=4)(delayed(mult)(x) for x in range(4))) I have no clues why. I use the following setup.py : from distutils.core import

Exception in thread QueueManagerThread - scikit-learn

荒凉一梦 提交于 2021-01-27 18:50:19
问题 When I set n_jobs=-1 I get error and if I set n_jobs equal big value (n_jobs=100), but if set smaller value (e.g. n_jobs=32), it works fine. I've tried reinstall scikit-learn and joblib packages, but to no avail. Also, it (n_jobs=-1) works fine previously, but suddenly go wrong. from sklearn import datasets from sklearn.model_selection import cross_validate, StratifiedKFold from sklearn.linear_model import RidgeClassifier iris = datasets.load_iris() iris_X = iris.data iris_y = iris.target skf

How to parallelize the for loop inside a async function and track for loop execution status?

荒凉一梦 提交于 2021-01-20 20:13:30
问题 Recently, I have asked a question regarding how to track the progress of a for loop inside a API deployed. Here's the link. The solution code that worked for me is, from fastapi import FastAPI, UploadFile from typing import List import asyncio import uuid context = {'jobs': {}} app = FastAPI() async def do_work(job_key, files=None): iter_over = files if files else range(100) for file, file_number in enumerate(iter_over): jobs = context['jobs'] job_info = jobs[job_key] job_info['iteration'] =

How to parallelize the for loop inside a async function and track for loop execution status?

余生颓废 提交于 2021-01-20 20:12:47
问题 Recently, I have asked a question regarding how to track the progress of a for loop inside a API deployed. Here's the link. The solution code that worked for me is, from fastapi import FastAPI, UploadFile from typing import List import asyncio import uuid context = {'jobs': {}} app = FastAPI() async def do_work(job_key, files=None): iter_over = files if files else range(100) for file, file_number in enumerate(iter_over): jobs = context['jobs'] job_info = jobs[job_key] job_info['iteration'] =