parallel-processing

Clear memory in python loop

*爱你&永不变心* 提交于 2020-01-16 08:48:30
问题 How i can clear memory in this python loop ? import concurrent.futures as futures with futures.ThreadPoolExecutor(max_workers=100) as executor: fs = [executor.submit(get_data, url) for url in link] for i, f in enumerate(futures.as_completed(fs)): x= (f.result()) results.append(x) del x del f get_data - simple function wich using requests 回答1: My solution would be as such: import concurrent.futures as futures #split the original grand list into smaller batches batchurlList = [grandUrlList[x:x

Scala Parallel Collections: How to know and configure the number of threads

牧云@^-^@ 提交于 2020-01-16 04:11:07
问题 I am using scala parallel collections. val largeList = list.par.map(x => largeComputation(x)).toList It is blazing fast, but I have a feeling that I may run into out-of-memory issues if we run too may "largeComputation" in parallel. Therefore when testing, I would like to know how many threads is the parallel collection using and if-need-be, how can I configure the number of threads for the parallel collections. 回答1: Here is a piece of scaladoc where they explain how to change the task

Parallelize or vectorize all-against-all operation on a large number of matrices?

放肆的年华 提交于 2020-01-16 04:08:12
问题 I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm. In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so list = who('data_matrix_prefix*') H = cell(numel(list),numel(list)); for i=1:numel(list) for j=1:numel(list) if i

Spark map is only one task while it should be parallel (PySpark)

时光总嘲笑我的痴心妄想 提交于 2020-01-16 03:56:09
问题 I have a RDD with around 7M entries with 10 normalized coordinates in each. I also have a number of centers and I'm trying to map every entry to the closest (Euclidean distance) center. The problem is that this only generates one task which means it is not parallelizing. This is the form: def doSomething(point,centers): for center in centers.value: if(distance(point,center)<1): return(center) return(None) preppedData.map(lambda x:doSomething(x,centers)).take(5) The preppedData RDD is cached

incorrect output with TBB pipeline

怎甘沉沦 提交于 2020-01-16 03:52:24
问题 I have written a C structure with different values (100 times) in text files such as 1.txt, 2.txt ... 100.txt I am using Intel TBB on Linux. I have created: InputFilter (serial_in_order MODE) TransformFIlter (serial_in_order MODE) OutputFilter (Serial_in_order MODE) The InputFilter reads structure from a file and passes it to TransformFilter. The TrasnformFilter updates the structure values and passes it to OutputFilter. The OutputFilter writes the new structure on the disc. Basically, it is

studying parallel programming python

旧街凉风 提交于 2020-01-16 01:18:11
问题 import multiprocessing from multiprocessing import Pool from source.RUN import* def func(r,grid,pos,h): return r,grid,pos,h p = multiprocessing.Pool() # Creates a pool with as many workers as you have CPU cores results = [] if __name__ == '__main__': for i in pos[-1]<2: results.append(Pool.apply_async(LISTE,(r,grid,pos[i,:],h))) p.close() p.join() for result in results: print('liste', result.get()) I want to create Pool for (LISTE,(r,grid,pos[i,:],h)) process and i is in pos which is variable

In message passing (MPI) mpi_send and recv “what waits”

↘锁芯ラ 提交于 2020-01-15 23:21:50
问题 Consider the configuration to be First: Not buffered, blocking(synchronous) As I understand MPI is an API, so when we do the mpi_send blocking function call, does the sender function/program get blocked? OR Does the MPI API function mpi_send get blocked, so that the program can continue its work till message is sent? Second: Similar confusion, does the mpi_recv get blocked or the function from where it was called gets blocked? Reason for such a stupid question: It's parallel processing so why

In message passing (MPI) mpi_send and recv “what waits”

空扰寡人 提交于 2020-01-15 23:21:34
问题 Consider the configuration to be First: Not buffered, blocking(synchronous) As I understand MPI is an API, so when we do the mpi_send blocking function call, does the sender function/program get blocked? OR Does the MPI API function mpi_send get blocked, so that the program can continue its work till message is sent? Second: Similar confusion, does the mpi_recv get blocked or the function from where it was called gets blocked? Reason for such a stupid question: It's parallel processing so why

Perl fork queue for n-Core processor

时光总嘲笑我的痴心妄想 提交于 2020-01-15 23:03:07
问题 I am writing an application similar to what was suggested here. Essentially, I am using Perl to manage the execution of multiple CPU intensive processes in parallel via fork and wait. However, I am running on a 4-core machine, and I have many more processes, all with very dissimilar expected run-times which aren't known a priori. Ultimately, it would take more effort to estimate the run times and gang them appropriately, than to simply utilize a queue system for each core. Ultimately I want

Distributed for loop in pyspark dataframe

大城市里の小女人 提交于 2020-01-15 12:17:28
问题 Context : My company is in Spark 2.2 so it's not possible to use pandas_udf for distributed column processing I have dataframes that contain thousands of columns(features) and millions of records df = spark.createDataFrame([(1,"AB", 100, 200,1), (2, "AC", 150,200,2), (3,"AD", 80,150,0)],["Id","Region","Salary", "HouseHoldIncome", "NumChild"]) I would like to perform certain summary and statistics on each column in a parallel manner and wonder what is the best way to achieve this #The point is