slurm | 易学教程

one-to-one dependency between two job arrays in SLURM

阅读更多关于 one-to-one dependency between two job arrays in SLURM

问题 The server just switched from CONDOR to SLURM, so I am learning and trying to translate my submission script to SLURM. My question is the following, I have two job arrays. The second is dependent on the first one. For the time being, I something like the following events1=$(sbatch --job-name=events --array=1-3 --output=z-events-%a.stdout myfirst.sh) jobid_events1=`echo ${events1} | sed -n -e 's/^.*job //p' ` echo "The job ID of the events is "${jobid_events1} postevents1=$(sbatch --job-name

Slurm Multiprocessing Python Job

阅读更多关于 Slurm Multiprocessing Python Job

问题 I have a 4 node Slurm cluster, each with 6 cores. I would like to submit a test Python script (it spawns processes that print the hostname of the node it's being run on) utilizing Multiprocessing as follows: def print_something(): print gethostname() # number of processes allowed to run on the cluster at a given time n_procs = int(environ['SLURM_JOB_CPUS_PER_NODE']) * int(environ['SLURM_JOB_NUM_NODES']) # tell Python how many processes can run at a time pool = Pool(n_procs) # spawn an

Sbatch: pass job name as input argument

阅读更多关于 Sbatch: pass job name as input argument

问题 I have the following script to submit job with slurm: #!/bin/sh #!/bin/bash #SBATCH -J $3 #job_name #SBATCH -n 1 #Number of processors #SBATCH -p CA nwchem $1 > $2 The first argument ($1) is my input, the second ($2) is my output and I would like the third ($3) to be my jobname. If I do like this, the job name is '$3'. How can I proceed to give the jobname as an argument of the script? Thanks 回答1: The SBATCH directives are seen as comments by the shell and it does not perform variable

seq uses comma as decimal separator

阅读更多关于 seq uses comma as decimal separator

问题 I have noticed a strange seq behavior on one of my computers (Ubuntu LTS 14.04): instead of using points as decimal separator it is using commas: seq 0. 0.1 0.2 0,0 0,1 0,2 The same version of seq (8.21) on my other PC gives the normal points (also same Ubuntu version). The strangest thing is that I am observing the same ill behavior on a remote machine when I ssh into it from the first machine. Even a bash script submitted from the conflictive machine to a job scheduler (slurm) on the remote

Limit the number of running jobs in SLURM

阅读更多关于 Limit the number of running jobs in SLURM

问题 I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm? Thanks in advance! 回答1: If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID> , and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD . Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs

slurm exceeded job memory limit with python multiprocessing

阅读更多关于 slurm exceeded job memory limit with python multiprocessing

问题 I'm using slurm to manage some of our calculations but sometimes jobs are getting killed with an out-of-memory error even though this should not be the case. This strange issue has been with python jobs using multiprocessing in particular. Here's a minimal example to reproduce this behavior #!/usr/bin/python from time import sleep nmem = int(3e7) # this will amount to ~1GB of numbers nprocs = 200 # will create this many workers later nsleep = 5 # sleep seconds array = list(range(nmem)) #

SLURM: Embarrassingly parallel program inside an embarrassingly parallel program

阅读更多关于 SLURM: Embarrassingly parallel program inside an embarrassingly parallel program

问题 I have a complex model written in Matlab. The model was not written by us and is best thought of as a "black box" i.e. in order to fix the relevant problems from the inside would require rewritting the entire model which would take years. If I have an "embarrassingly parallel" problem I can use an array to submit X variations of the same simulation with the option #SBATCH --array=1-X . However, clusters normally have a (frustratingly small) limit on the maximum array size. Whilst using a PBS

On the semantics of `srun … >output_file` for parallel tasks

阅读更多关于 On the semantics of `srun … >output_file` for parallel tasks

问题 Sorry, this question requires a lot of build-up, but in summary, it's about the conditions under which many parallel instances of srun ... >output_file will or won't lead to the clobbering by some process/task of the output produced by some other process/task. CASE 0: bash only (no SLURM) Suppose that prog-0.sh is the following toy script: #!/bin/bash hostname >&2 if [[ $JOB_INDEX = 0 ]] then date fi This script prints some output to stderr , and possibly prints the current date to stdout .

SLURM slow for array job

阅读更多关于 SLURM slow for array job

问题 I have a small cluster with nodes A, B, C and D. Each node has 80GB RAM and 32 CPUs. I am using Slurm 17.11.7. I performed the following benchmark tests: If I run a particular Java command directly on terminal on node A, I get an result in 2minutes. If I run the same command with an "single" array job (#SBATCH --array=1-1), I get an result again in 2minutes. If I ran the same command with same parameters with an array job on slurm only on node A, I get the output in 8mininutes, that is, it is

Output log file to cluster option

阅读更多关于 Output log file to cluster option

问题 I'm submitting jobs to slurm/sbatch via snakemake . I'm trying to send the log from sbatch to a file in the same directory tree of the rule's output. For example, this works: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output log.txt" but it fails ( i.e. slurm job status is FAILED) if I try: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output {output}.log" presumably, because {output} points to foo/bar/ which does not exist. But snakemake should have created