parallel-processing | 易学教程

CasperJS, parallel browsing WITH the testing framework

阅读更多关于 CasperJS, parallel browsing WITH the testing framework

问题 Question : I would like to know if it's possible to do parallel browsing with the testing framework in one script file , so with the tester module and casperjs test command. I've seen some people create two casper instances : CasperJS simultaneous requests and https://groups.google.com/forum/#!topic/casperjs/Scx4Cjqp7hE , but as said in the doc, we can't create new casper instance in a test script. So i tried doing something similar-simple example- with a casper testing script (just copy and

Poor scaling of multiprocessing Pool.map() on a list of large objects: How to achieve better parallel scaling in python?

阅读更多关于 Poor scaling of multiprocessing Pool.map() on a list of large objects: How to achieve better parallel scaling in python?

问题 Let us define : from multiprocessing import Pool import numpy as np def func(x): for i in range(1000): i**2 return 1 Notice that func() does something and it always returns a small number 1 . Then, I compare an 8-core parallel Pool.map() v/s a serial, python built in, map() n=10**3 a=np.random.random(n).tolist() with Pool(8) as p: %timeit -r1 -n2 p.map(func,a) %timeit -r1 -n2 list(map(func,a)) This gives : 38.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each) 200 ms ± 0 ns per

Multiprocessing so slow

阅读更多关于 Multiprocessing so slow

问题 I have a function that does the following : Take a file as input and does basic cleaning. Extract the required items from the file and then write them in a pandas dataframe. The dataframe is finally converted into csv and written into a folder. This is the sample code: def extract_function(filename): with open(filename,'r') as f: input_data=f.readlines() try: // some basic searching pattern matching extracting // dataframe creation with 10 columns and then extracted values are filled in empty

Multiprocessing so slow

阅读更多关于 Multiprocessing so slow

parallel excution and file writing on python

阅读更多关于 parallel excution and file writing on python

问题 I have a very large datasets distributed in 10 big clusters and the task is to do some computations for each cluster and write (append) the results line by line into 10 files where each file contains the results obtained corresponding to each one of the 10 clusters, each cluster can be computed independently, and I want to parallelize the code into ten CPUs (or threads) such that I can do the computations on all the clusters at once, a simplified pseudo code for my task is the following: for

allow user to complete parallel / xargs command (function) after selecting files into array; quoting nuls correctly in printf script

阅读更多关于 allow user to complete parallel / xargs command (function) after selecting files into array; quoting nuls correctly in printf script

问题 This is a follow-on question to this question In that question, I could get selected files into an array and pass them to a command / function (already exported). This question differs in that I would like the user to complete the command after selecting the files. The filenames can have spaces in them; hence the choice of Null-separated. I'm using FZF to select the files. It produces an array containing nul-ending filenames, I think. But the first item that FZF produces is the name of a key

Parallel downloads with Multiprocessing and PySftp

阅读更多关于 Parallel downloads with Multiprocessing and PySftp

问题 I'm trying to create a code to download N files at the same type using pysftp and multiprocessing libs. I made a basic python training, got pieces of codes and combined them into one, but I can't get it work out. I'd appreciate if somebody helps me with that. The error occurs after the vFtp.close() command. In the part that suppose to start simultaneous downloads. from multiprocessing import Pool import pysftp import os vHost='10.11.12.13' vLogin='admin' vPwd='pass1234' vFtpPath='/export/home

How to retrieve values from a function run in parallel processes?

阅读更多关于 How to retrieve values from a function run in parallel processes?

问题 The Multiprocessing module is quite confusing for python beginners specially for those who have just migrated from MATLAB and are made lazy with its parallel computing toolbox. I have the following function which takes ~80 Secs to run and I want to shorten this time by using Multiprocessing module of Python. from time import time xmax = 100000000 start = time() for x in range(xmax): y = ((x+5)**2+x-40) if y <= 0xf+1: print('Condition met at: ', y, x) end = time() tt = end-start #total time

Non-blocking / Asynchronous Execution in Perl

阅读更多关于 Non-blocking / Asynchronous Execution in Perl

问题 Is there a way to implement non-blocking / asynchronous execution (without fork()'ing) in Perl? I used to be a Python developer for many years... Python has really great 'Twisted' framework that allows to do so (using DEFERREDs. When I ran search to see if there is anything in Perl to do the same, I came across POE framework - which seemed "close" enough to what I was searching for. But... after spending some time reading the documentation and "playing" with the code, I came against "the wall

spark sql : How to achieve parallel processing of dataframe at group level but with in each group, we require sequential processing of rows

阅读更多关于 spark sql : How to achieve parallel processing of dataframe at group level but with in each group, we require sequential processing of rows

问题 Apply grouping on the data frame. Let us say it resulted in 100 groups with 10 rows each. I have a function that has to be applied on each group. It can happen in parallel fashion and in any order (i.e., it is upto the spark discretion to choose any group in any order for execution). But with in group, I need the guarantee of sequential processing of the rows. Because after processing each row in a group, I use the output in the processing of any of the rows remaining in the group. We took