slurm | 易学教程

`srun` drop-in replacement

阅读更多关于 `srun` drop-in replacement

问题 I'm trying to create a function that serves as a drop-in replacement for the SLURM's srun command. The need for this wrapper function is that I want to codify a script using srun when being started under SLURM control, but still being able to run the script without SLURM. So far, I have this function: srun_wrap() { if [ -z "$SLURM_JOB_ID" ] then # Not running under SLURM so start the code without srun "${@:2}" else # A SLURM job ID found, so use srun srun ${@:1:1} "${@:2}" fi } This allows me

R parallel job hangs

阅读更多关于 R parallel job hangs

问题 I am running the snow-test.R script as written on the site: https://hpcf.umbc.edu/other-packages/how-to-run-r-programs-on-maya/ the script is run on a cluster using SLURM with the command: mpirun -np 1 R CMD BATCH --no-save --no-restore snow-test.R out.dat the R version is 3.5.1. Although the script runs and displays the result, it hangs at the end. Thus, I need to kill the process manually from the SLURM queue with scancel command. Also, when I change the R version to 3.4.4 using the same

R parallel job hangs

阅读更多关于 R parallel job hangs

How to get original location of script used for SLURM job?

阅读更多关于 How to get original location of script used for SLURM job?

问题 I'm starting the SLURM job with script and script must work depending on it's location which is obtained inside of script itself with SCRIPT_LOCATION=$(realpath $0) . But SLURM copies script to slurmd folder and starts job from there and it screws up further actions. Are there any option to get location of script used for slurm job before it has been moved/copied? Script is located in network shared folder /storage/software_folder/software_name/scripts/this_script.sh and it must to: get it's

After submitting a .m batch job with Slurm, can I edit my .m file without changing my original submission?

阅读更多关于 After submitting a .m batch job with Slurm, can I edit my .m file without changing my original submission?

问题 Say I want to run a job on the cluster: job1.m Slurm handles the batch jobs and I'm loading Mathematica to save the output file job1.csv I submit job1.m and it is sitting in the queue. Now, I edit job1.m to have different variables and parameters, and tell it to save data to job1_edited.csv. Then I re-submit job1.m. Now I have two batch jobs in the queue. What will happen to my output files? Will job1.csv be data from the original job1.m file? And will job1_edited.csv be data from the edited

Running multiple worker daemons SLURM

阅读更多关于 Running multiple worker daemons SLURM

问题 I want to run multiple worker daemons on single machine. As per damienfrancois's answer on what is the minimum number of computers for a slurm cluster it can be done. Problem is currently I am able to execute only 1 worker daemon on one machine. for example When I run sudo slurmd -N linux1 -cDvv sudo slurmd -N linux2 -cDvv linux1 goes down when I run linux2. Is it possible to run multiple worker daemons on one machine? Here is my slurm.conf file 回答1: as your intention seems to be just testing

How to import a local python module when using the sbatch command in SLURM

阅读更多关于 How to import a local python module when using the sbatch command in SLURM

问题 I was using the cluster manager slurm and I was running a submission script with sbatch (with a python interpeter). The sbatch submission imported one of my modules called main_nn.py . The module is located in the same place as my submission directory, however, python fails to find it even though the file exists. I am having a hard time figuring it out why this is happening. My python file looks as follow: #!/usr/bin/env python #SBATCH --job-name=Python print('hi') import main_nn however the

How to import a local python module when using the sbatch command in SLURM

阅读更多关于 How to import a local python module when using the sbatch command in SLURM

What happens if I am running more subjobs than the number of core allocated

阅读更多关于 What happens if I am running more subjobs than the number of core allocated

问题 So I have a sbatch (slurm job scheduler) script in which I am processing a lot of data through 3 scripts: foo1.sh, foo2.sh and foo3.sh. foo1.sh and foo2.sh are independent and I want to run them simultaneously. foo3.sh needs the outputs of foo1.sh and foo2.sh so I am building a dependency. And then I have to repeat it 30 times. Let say: ## Resources config #SBATCH --ntasks=30 #SBATCH --task-per-core=1 for i in {1..30}; do srun -n 1 --jobid=foo1_$i ./foo1.sh & srun -n 1 --jobid=foo2_$i ./foo2

What happens if I am running more subjobs than the number of core allocated

阅读更多关于 What happens if I am running more subjobs than the number of core allocated