parallel-processing | 易学教程

Running several PHP processes in parallel

阅读更多关于 Running several PHP processes in parallel

问题 We're working on a SEO related script in PHP, and we need to run different modules (each one of them are a file .php) at the same time once we finish with the crawling process. In other words, we need to execute more than 10 .php files, in parallel. The application used to work with a sequence, so once when one script was ending, the user's browser was forwarded into the next one. Each one of the scripts is establishing a connection to the database, and sending different HTTP packets to the

How to parallelize computation on “big data” dictionary of lists?

阅读更多关于 How to parallelize computation on “big data” dictionary of lists?

问题 I have a question here regarding doing calculations on a python dictionary----in this case, the dictionary has millions of keys, and the lists are similarly long. There seems to be disagreement whether one could use parallelization here, so I'll ask the question here more explicitly. Here is the original question: Optimizing parsing of massive python dictionary, multi-threading This is a toy (small) python dictionary: example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821],

How to parallelize computation on “big data” dictionary of lists?

阅读更多关于 How to parallelize computation on “big data” dictionary of lists?

Efficient parallelization of operations on two dimensional array operations in python

阅读更多关于 Efficient parallelization of operations on two dimensional array operations in python

问题 I'm trying to parallelize operations on two dimensional array using joblib library in python. Here is the code I have from joblib import Parallel, delayed import multiprocessing import numpy as np # The code below just aggregates the base_array to form a new two dimensional array base_array = np.ones((2**12, 2**12), dtype=np.uint8) def compute_average(i, j): return np.uint8(np.mean(base_array[i*4: (i+1)*4, j*4: (j+1)*4])) num_cores = multiprocessing.cpu_count() new_array = np.array(Parallel(n

Optimisation tips to find in which triangle a point belongs

阅读更多关于 Optimisation tips to find in which triangle a point belongs

问题 I'm actually having some troubles optimising my algorithm: I have a disk (centered in 0, with radius 1) filled with triangles (not necessarily of same area/length). There could be a HUGE amount of triangle (let's say from 1k to 300k triangles) My goal is to find as quick as possible in which triangle a point belongs. The operation has to be repeated a large amount of time (around 10k times ). For now the algorithm I'm using is: I'm computing the barycentric coordinates of the point in each

GPU computing for bootstrapping using “boot” package

阅读更多关于 GPU computing for bootstrapping using “boot” package

问题 I would like to do a large analysis using bootstrapping. I saw that the speed of bootstrapping is increased using parallel computing as in the following code: Parallel computing # detect number of cpu library(parallel) detectCores() library(boot) # boot function --> mean bt.mean <- function(dat, d){ x <- dat[d] m <- mean(x) return(m) } # obtain confidence intervals # use parallel computing with 4 cpus x <- mtcars$mpg bt <- boot(x, bt.mean, R = 1000, parallel = "snow", ncpus = 4) quantile(bt$t

GPU computing for bootstrapping using “boot” package

阅读更多关于 GPU computing for bootstrapping using “boot” package

GPU computing for bootstrapping using “boot” package

阅读更多关于 GPU computing for bootstrapping using “boot” package

How to apply a function to multiple columns of a Dask Data Frame in parallel?

阅读更多关于 How to apply a function to multiple columns of a Dask Data Frame in parallel?

问题 I have a Dask Dataframe for which I would like to compute skewness for a list of columns and if this skewness exceeds a certain threshold, I correct it using log transformation. I am wondering whether there is a more efficient way of making correct_skewness() function work on multiple columns in parallel by removing the for loop in the correct_skewness() function below: import dask import dask.array as da from scipy import stats # Create a dataframe df = dask.datasets.timeseries() df.head()

GNU Parallel: split file into children

阅读更多关于 GNU Parallel: split file into children

问题 Goal Use GNU Parallel to split a large .gz file into children. Since the server has 16 CPUs, create 16 children. Each child should contain, at most, N lines. Here, N = 104,214,420 lines. Children should be in .gz format. Input File name: file1.fastq.gz size: 39 GB line count: 1,667,430,708 (uncompressed) Hardware 36 GB Memory 16 CPUs HPCC environment (I'm not admin) Code Version 1 zcat "${input_file}" | parallel --pipe -N 104214420 --joblog split_log.txt --resume-failed "gzip > ${input_file}