parallel-processing

Running several PHP processes in parallel

旧时模样 提交于 2021-02-18 19:27:19
问题 We're working on a SEO related script in PHP, and we need to run different modules (each one of them are a file .php) at the same time once we finish with the crawling process. In other words, we need to execute more than 10 .php files, in parallel. The application used to work with a sequence, so once when one script was ending, the user's browser was forwarded into the next one. Each one of the scripts is establishing a connection to the database, and sending different HTTP packets to the

How to parallelize computation on “big data” dictionary of lists?

守給你的承諾、 提交于 2021-02-18 19:00:17
问题 I have a question here regarding doing calculations on a python dictionary----in this case, the dictionary has millions of keys, and the lists are similarly long. There seems to be disagreement whether one could use parallelization here, so I'll ask the question here more explicitly. Here is the original question: Optimizing parsing of massive python dictionary, multi-threading This is a toy (small) python dictionary: example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821],

How to parallelize computation on “big data” dictionary of lists?

只谈情不闲聊 提交于 2021-02-18 18:58:47
问题 I have a question here regarding doing calculations on a python dictionary----in this case, the dictionary has millions of keys, and the lists are similarly long. There seems to be disagreement whether one could use parallelization here, so I'll ask the question here more explicitly. Here is the original question: Optimizing parsing of massive python dictionary, multi-threading This is a toy (small) python dictionary: example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821],

Efficient parallelization of operations on two dimensional array operations in python

耗尽温柔 提交于 2021-02-18 17:49:49
问题 I'm trying to parallelize operations on two dimensional array using joblib library in python. Here is the code I have from joblib import Parallel, delayed import multiprocessing import numpy as np # The code below just aggregates the base_array to form a new two dimensional array base_array = np.ones((2**12, 2**12), dtype=np.uint8) def compute_average(i, j): return np.uint8(np.mean(base_array[i*4: (i+1)*4, j*4: (j+1)*4])) num_cores = multiprocessing.cpu_count() new_array = np.array(Parallel(n

Optimisation tips to find in which triangle a point belongs

和自甴很熟 提交于 2021-02-18 17:49:03
问题 I'm actually having some troubles optimising my algorithm: I have a disk (centered in 0, with radius 1) filled with triangles (not necessarily of same area/length). There could be a HUGE amount of triangle (let's say from 1k to 300k triangles) My goal is to find as quick as possible in which triangle a point belongs. The operation has to be repeated a large amount of time (around 10k times ). For now the algorithm I'm using is: I'm computing the barycentric coordinates of the point in each

GPU computing for bootstrapping using “boot” package

寵の児 提交于 2021-02-18 17:11:32
问题 I would like to do a large analysis using bootstrapping. I saw that the speed of bootstrapping is increased using parallel computing as in the following code: Parallel computing # detect number of cpu library(parallel) detectCores() library(boot) # boot function --> mean bt.mean <- function(dat, d){ x <- dat[d] m <- mean(x) return(m) } # obtain confidence intervals # use parallel computing with 4 cpus x <- mtcars$mpg bt <- boot(x, bt.mean, R = 1000, parallel = "snow", ncpus = 4) quantile(bt$t

GPU computing for bootstrapping using “boot” package

倾然丶 夕夏残阳落幕 提交于 2021-02-18 17:10:17
问题 I would like to do a large analysis using bootstrapping. I saw that the speed of bootstrapping is increased using parallel computing as in the following code: Parallel computing # detect number of cpu library(parallel) detectCores() library(boot) # boot function --> mean bt.mean <- function(dat, d){ x <- dat[d] m <- mean(x) return(m) } # obtain confidence intervals # use parallel computing with 4 cpus x <- mtcars$mpg bt <- boot(x, bt.mean, R = 1000, parallel = "snow", ncpus = 4) quantile(bt$t

GPU computing for bootstrapping using “boot” package

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-18 17:09:24
问题 I would like to do a large analysis using bootstrapping. I saw that the speed of bootstrapping is increased using parallel computing as in the following code: Parallel computing # detect number of cpu library(parallel) detectCores() library(boot) # boot function --> mean bt.mean <- function(dat, d){ x <- dat[d] m <- mean(x) return(m) } # obtain confidence intervals # use parallel computing with 4 cpus x <- mtcars$mpg bt <- boot(x, bt.mean, R = 1000, parallel = "snow", ncpus = 4) quantile(bt$t

How to apply a function to multiple columns of a Dask Data Frame in parallel?

最后都变了- 提交于 2021-02-18 17:00:20
问题 I have a Dask Dataframe for which I would like to compute skewness for a list of columns and if this skewness exceeds a certain threshold, I correct it using log transformation. I am wondering whether there is a more efficient way of making correct_skewness() function work on multiple columns in parallel by removing the for loop in the correct_skewness() function below: import dask import dask.array as da from scipy import stats # Create a dataframe df = dask.datasets.timeseries() df.head()

GNU Parallel: split file into children

泪湿孤枕 提交于 2021-02-18 11:34:12
问题 Goal Use GNU Parallel to split a large .gz file into children. Since the server has 16 CPUs, create 16 children. Each child should contain, at most, N lines. Here, N = 104,214,420 lines. Children should be in .gz format. Input File name: file1.fastq.gz size: 39 GB line count: 1,667,430,708 (uncompressed) Hardware 36 GB Memory 16 CPUs HPCC environment (I'm not admin) Code Version 1 zcat "${input_file}" | parallel --pipe -N 104214420 --joblog split_log.txt --resume-failed "gzip > ${input_file}