parallel-processing | 易学教程

High performance implement of atomic minimal operation

阅读更多关于 High performance implement of atomic minimal operation

问题 There is no atomic minimal operation in OpenMP, also no intrinsic in Intel MIC's instruction set. #pragmma omp critial is very insufficient in the performance. I want to know if there is a high performance implement of atomic minimal for Intel MIC. 回答1: According to the OpenMP 4.0 Specifications (Section 2.12.6), there is a lot of fast atomic minimal operations you can do by using the #pragma omp atomic construct in place of #pragma omp critical (and thereby avoid the huge overhead of its

Check if adjacent slave process is ended in MPI

阅读更多关于 Check if adjacent slave process is ended in MPI

问题 In my MPI program, I want to send and receive information to adjacent processes. But if a process ends and doesn't send anything, its neighbors will wait forever. How can I resolve this issue? Here is what I am trying to do: if (rank == 0) { // don't do anything until all slaves are done } else { while (condition) { // send info to rank-1 and rank+1 // if can receive info from rank-1, receive it, store received info locally // if cannot receive info from rank-1, use locally stored info // do

how are handles distributed after MPI_Comm_split?

阅读更多关于 how are handles distributed after MPI_Comm_split?

问题 Say, i have 8 processes. When i do the following, the MPU_COMM_WORLD communicator will be splitted into two communicators. The processes with even ids will belong to one communicator and the processes with odd ids will belong to another communicator. color=myid % 2; MPI_Comm_split(MPI_COMM_WORLD,color,myid,&NEW_COMM); MPI_Comm_rank( NEW_COMM, &new_id); My question is where is the handle for these two communicators. After the split the ids of processors which before were 0 1 2 3 4 5 6 7 will

Using MPI_Send/Recv to handle chunk of multi-dim array in Fortran 90

阅读更多关于 Using MPI_Send/Recv to handle chunk of multi-dim array in Fortran 90

问题 I have to send and receive (MPI) a chunk of a multi-dimensional array in FORTRAN 90. The line MPI_Send(x(2:5,6:8,1),12,MPI_Real,....) is not supposed to be used, as per the book "Using MPI..." by Gropp, Lusk, and Skjellum. What is the best way to do this? Do I have to create a temporary array and send it or use MPI_Type_Create_Subarray or something like that? 回答1: The reason not to use array sections with MPI_SEND is that the compiler has to create a temporary copy with some MPI

Parallel.ForEach throws exception when processing extremely large sets of data

阅读更多关于 Parallel.ForEach throws exception when processing extremely large sets of data

问题 My question centers on some Parallel.ForEach code that used to work without fail, and now that our database has grown to 5 times as large, it breaks almost regularly. Parallel.ForEach<Stock_ListAllResult>( lbStockList.SelectedItems.Cast<Stock_ListAllResult>(), SelectedStock => { ComputeTipDown( SelectedStock.Symbol ); } ); The ComputeTipDown() method gets all daily stock tic data for the symbol, and iterates through each day, gets yesterday's data and does a few calculations and then inserts

linux batch jobs in parallel

阅读更多关于 linux batch jobs in parallel

问题 I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste

cuda kernel for conway's game of life

阅读更多关于 cuda kernel for conway's game of life

问题 I'm trying to calculate the number of transitions that would be made in a run of Conway's GOL for a pxq matrix for n iterations. For instance, given 1 iteration with the initial state being 1 blinker (as below). there would be 5 transitions (2 births, 1 survival, 2 deaths from underpopulation). I've already got this working, but I'd like to convert this logic to run using CUDA. Below is what I want to port to CUDA. code: static void gol() // call this iterations x's { int[] tempGrid = new int

Parallel function from joblib running whole code apart from functions

阅读更多关于 Parallel function from joblib running whole code apart from functions

问题 I am using Parallel function from joblib package in Python. I would like to use this function only for handle one of my functions but unfortunately the whole code is running in parallel (apart from other functions). Example: from joblib import Parallel, delayed print ('I do not want this to be printed n times') def do_something(arg): some calculations(arg) Parallel(n_jobs=5)(delayed(do_something)(i) for i in range(0, n)) 回答1: This is a common error to miss a design direction from

Set seed parallel random forest in caret for reproducible result

阅读更多关于 Set seed parallel random forest in caret for reproducible result

问题 I wish to run random forest in parallel using caret package, and I wish to set the seeds for reproducible result as in Fully reproducible parallel models using caret. However, I don't understand line 9 in the following code taken from caret help: why do we sample 22 (plus the last model in line 12, 23) integer numbers (12 values for parameter k are evaluated)? For information, I wish to run 5-fold CV to evaluate 584 values for RF parameter 'mtry'. Any help is much appreciated. Thank you. ##

Set seed parallel random forest in caret for reproducible result

阅读更多关于 Set seed parallel random forest in caret for reproducible result