Optimizing N-queen with openmp

蹲街弑〆低调 提交于 2019-12-07 13:32:07

问题


I am learning OPENMP and wrote the following code to solve nqueens problem.

//Full Code: https://github.com/Shafaet/Codes/blob/master/OPENMP/Parallel%20N-  Queen%20problem.cpp
int n;

int call(int col,int rowmask,int dia1,int dia2)
{
    if(col==n) 
    {
        return 1;

    }
    int row,ans=0;
    for(row=0;row<n;row++)
    {
        if(!(rowmask & (1<<row)) & !(dia1 & (1<<(row+col))) & !(dia2 & (1<<((row+n-1)-col))))
        {           
            ans+=call(col+1,rowmask|1<<row,dia1|(1<<(row+col)), dia2|(1<<((row+n-1)-col)));
        }
    }
    return ans;

}

double parallel()
{
    double st=omp_get_wtime();
    int ans=0;
    int i;
    int rowmask=0,dia1=0,dia2=0;
     #pragma omp parallel for reduction(+:ans) shared(i,rowmask)
    for(i=0;i<n;i++)
    {
        rowmask=0;
        dia1=0,dia2=0;
        int col=0,row=i;
        ans+=call(1,rowmask|1<<row,dia1|(1<<(row+col)), dia2|(1<<((row+n-1)-col)));
    }
    printf("Found %d configuration for n=%d\n",ans,n);
    double en=omp_get_wtime();
    printf("Time taken using openmp %lf\n",en-st);
    return en-st;

}
double serial()
{

    double st=omp_get_wtime();
    int ans=0;
    int i;
    int rowmask=0,dia1=0,dia2=0;
    for(i=0;i<n;i++)
    {
        rowmask=0;
        dia1=0,dia2=0;
        int col=0,row=i;
        ans+=call(1,rowmask|1<<row,dia1|(1<<(row+col)), dia2|(1<<((row+n-1)-col)));
    }
    printf("Found %d configuration for n=%d\n",ans,n);
    double en=omp_get_wtime();
    printf("Time taken without openmp %lf\n",en-st);
    return en-st;

}
int main()
{
    double average=0;
    int count=0;
    for(int i=2;i<=13;i++)
    {
        count++;
        n=i;

        double stime=serial();
        double ptime=parallel();
        printf("OpenMP is %lf times faster for n=%d\n",stime/ptime,n);
        average+=stime/ptime;
        puts("===============");
    }
    printf("On average OpenMP is %lf times faster\n",average/count);
    return 0;

}

Parallel code is already faster than normal one but i wonder how can i optimize it more using openmp pragmas. I want to know what i should do for better performance and what i should not do.

Thanks in advance.

(Please dont suggest any optimizations which are non-related to parallel programming)


回答1:


Your code seems to use classic backtracking N-Queens recursive algorithm, which is not the fastest possible for N-Queens solving, but (due to simplicity) is the most vivid one in terms of practicing with parallelism basics. That's being said: this is very simple, thus you don't expect it to naturally demonstrate lots of advanced OpenMP means except basic "parallel for" and reduction.

But, as far as you're looking for learning parallelism and probably for more clearness and better learning curve, there is one more (out of many possible) implementation available, which uses the same algorithm but tends to be more readable and vivid from educational perspective:

void setQueen(int queens[], int row, int col) {
//check all previously placed rows for attacks
for(int i=0; i<row; i++) {
   // vertical attacks
   if (queens[i]==col) {
       return;
   }

   // diagonal attacks
   if (abs(queens[i]-col) == (row-i) ) {
      return;
   }
}

// column is ok, set the queen
queens[row]=col;
if(row==size-1) {
#pragma omp atomic
    nrOfSolutions++;  //Placed final queen, found a solution
}
else {
     // try to fill next row
     for(int i=0; i<size; i++) {
         setQueen(queens, row+1, i);
     }
}
}

//Function to find all solutions for nQueens problem on size x size chessboard.
void solve() {
#pragma omp parallel for
    for(int i=0; i<size; i++) {
         // try all positions in first row
         int * queens = new int[size];  //array representing queens placed on a chess board.  Index is row position, value is column.
         setQueen(queens, 0, i);
         delete[](queens);
     }
}

This given code is one of Intel Advisor XE samples (for both C++ and Fortran); the parallelization aspects for given sample are discussed in very detailed manner in Chapter 10 of given Parallel Programming Book (in fact, given chapter just uses N-Queens to demonstrate how to use tools in order to parallelize serial code in general).

Given Advisor n-queens sample uses essentially the same algorithm as yours, but it replaces explicit reduction with combination of simple parallel for + atomic. This code is expected to be less efficient, but more "procedural-style" and more "educational", since it demonstrates "hidden" data race. In case you upload given samplecode, you will actually find 4 equialent N-Queens parallel implementatons using TBB, Cilk Plus and OpenMP (OMP is for C++ and Fortran).




回答2:


I know I am a little late for the party, but you can use task queueing for further optimization.(about 7-10% faster results).No idea why. Here's the code,that i am using :

#include <iostream>  // std::cout, cin, cerr ...
#include <iomanip>   // modify std::out
#include <omp.h>

using namespace std;

int nrOfSolutions=0;
int size=0;

void print(int queens[]) {
  cerr << "Solution " << nrOfSolutions << endl; 
  for(int row=0; row<size; row++) {
    for(int col=0; col<size; col++) {
      if(queens[row]==col) {
  cout << "Q";
      }
      else {
  cout << "-";
      }
    }
    cout << endl;
  }
}

void setQueen(int queens[], int row, int col, int id) {

  for(int i=0; i<row; i++) {
    // vertical attacks
    if (queens[i]==col) {
      return;
    }
    // diagonal attacks
    if (abs(queens[i]-col) == (row-i) ) {
      return;
    }
  }

  // column is ok, set the queen
  queens[row]=col;

  if(row==size-1) {


    // only one thread should print allowed to print at a time
    {
      // increasing the solution counter is not atomic
#pragma omp critical
      nrOfSolutions++;
#ifdef _DEBUG
#pragma omp critical
      print(queens);
#endif
    }

  }
  else {
    // try to fill next row
    for(int i=0; i<size; i++) {
      setQueen(queens, row+1, i, id);
    }
  }
}

void solve() {
  int myid=0 ;

#pragma omp parallel
#pragma omp single
  {
      for(int i=0; i<size; i++) {
/*
#ifdef _OMP //(???)
  myid = omp_get_thread_num();  
#endif
#ifdef _DEBUG
  cout << "ThreadNum: " << myid << endl ;
#endif
  */
  // try all positions in first row
  // create separate array for each recursion
  // started here
#pragma omp task
    setQueen(new int[size], 0, i, myid);
      }
    }
}

int main(int argc, char*argv[]) {

  if(argc !=2) {
    cerr << "Usage: nq-openmp-taskq boardSize.\n";
    return 0;
  }

  size = atoi(argv[1]);
  cout << "Starting OpenMP Task Queue solver for size " << size << "...\n";

    double st=omp_get_wtime();
    solve();

    double en=omp_get_wtime();
    printf("Time taken using openmp %lf\n",en-st);

  cout << "Number of solutions: " << nrOfSolutions << endl;

return 0;
}


来源:https://stackoverflow.com/questions/19078635/optimizing-n-queen-with-openmp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!