GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

后端 未结 2 1460
一向
一向 2020-12-31 18:24

I am using gnu parallel to launch code on a high performance (HPC) computing cluster that has 2 CPUs per node. The cluster uses TORQUE portable batch system (PBS). My questi

2条回答
  •  天涯浪人
    2020-12-31 18:46

    This is not an answer to the 3 primary questions, but I'd like to point out some other problems with the parallel statement in the first code block.

    parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
      matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40
    

    The shell expands the $PBS_O_WORKDIR prior to executing parallel. This means two things happen (1) the --env sees a filename rather than an environment variable name and essentially does nothing and (2) expands as part command string eliminating the need to pass $PBS_O_WORKDIR which is why there wasn't an error.

    The latest version of parallel 20151022 has a workdir option (although the tutorial lists it as alpha testing) which is probably the easiest solution. The parallel command line would look something like:

    parallel --workdir $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
      matlab -nodisplay -r "primes1({})" :::: 10 20 30 40
    

    Final note, PBS_NODEFILE may contain hosts listed multiple times if more than one processor is requested by qsub. This many have implications for number of jobs run, etc.

提交回复
热议问题