slurm

one-to-one dependency between two job arrays in SLURM

好久不见. 提交于 2019-12-21 21:59:56
问题 The server just switched from CONDOR to SLURM, so I am learning and trying to translate my submission script to SLURM. My question is the following, I have two job arrays. The second is dependent on the first one. For the time being, I something like the following events1=$(sbatch --job-name=events --array=1-3 --output=z-events-%a.stdout myfirst.sh) jobid_events1=`echo ${events1} | sed -n -e 's/^.*job //p' ` echo "The job ID of the events is "${jobid_events1} postevents1=$(sbatch --job-name

Slurm Multiprocessing Python Job

眉间皱痕 提交于 2019-12-21 20:29:00
问题 I have a 4 node Slurm cluster, each with 6 cores. I would like to submit a test Python script (it spawns processes that print the hostname of the node it's being run on) utilizing Multiprocessing as follows: def print_something(): print gethostname() # number of processes allowed to run on the cluster at a given time n_procs = int(environ['SLURM_JOB_CPUS_PER_NODE']) * int(environ['SLURM_JOB_NUM_NODES']) # tell Python how many processes can run at a time pool = Pool(n_procs) # spawn an

Sbatch: pass job name as input argument

試著忘記壹切 提交于 2019-12-21 09:17:30
问题 I have the following script to submit job with slurm: #!/bin/sh #!/bin/bash #SBATCH -J $3 #job_name #SBATCH -n 1 #Number of processors #SBATCH -p CA nwchem $1 > $2 The first argument ($1) is my input, the second ($2) is my output and I would like the third ($3) to be my jobname. If I do like this, the job name is '$3'. How can I proceed to give the jobname as an argument of the script? Thanks 回答1: The SBATCH directives are seen as comments by the shell and it does not perform variable

seq uses comma as decimal separator

空扰寡人 提交于 2019-12-21 09:16:11
问题 I have noticed a strange seq behavior on one of my computers (Ubuntu LTS 14.04): instead of using points as decimal separator it is using commas: seq 0. 0.1 0.2 0,0 0,1 0,2 The same version of seq (8.21) on my other PC gives the normal points (also same Ubuntu version). The strangest thing is that I am observing the same ill behavior on a remote machine when I ssh into it from the first machine. Even a bash script submitted from the conflictive machine to a job scheduler (slurm) on the remote

Limit the number of running jobs in SLURM

☆樱花仙子☆ 提交于 2019-12-18 04:54:32
问题 I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm? Thanks in advance! 回答1: If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID> , and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD . Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs

slurm exceeded job memory limit with python multiprocessing

北战南征 提交于 2019-12-13 19:25:29
问题 I'm using slurm to manage some of our calculations but sometimes jobs are getting killed with an out-of-memory error even though this should not be the case. This strange issue has been with python jobs using multiprocessing in particular. Here's a minimal example to reproduce this behavior #!/usr/bin/python from time import sleep nmem = int(3e7) # this will amount to ~1GB of numbers nprocs = 200 # will create this many workers later nsleep = 5 # sleep seconds array = list(range(nmem)) #

SLURM: Embarrassingly parallel program inside an embarrassingly parallel program

前提是你 提交于 2019-12-13 15:09:21
问题 I have a complex model written in Matlab. The model was not written by us and is best thought of as a "black box" i.e. in order to fix the relevant problems from the inside would require rewritting the entire model which would take years. If I have an "embarrassingly parallel" problem I can use an array to submit X variations of the same simulation with the option #SBATCH --array=1-X . However, clusters normally have a (frustratingly small) limit on the maximum array size. Whilst using a PBS

On the semantics of `srun … >output_file` for parallel tasks

落爺英雄遲暮 提交于 2019-12-13 14:21:50
问题 Sorry, this question requires a lot of build-up, but in summary, it's about the conditions under which many parallel instances of srun ... >output_file will or won't lead to the clobbering by some process/task of the output produced by some other process/task. CASE 0: bash only (no SLURM) Suppose that prog-0.sh is the following toy script: #!/bin/bash hostname >&2 if [[ $JOB_INDEX = 0 ]] then date fi This script prints some output to stderr , and possibly prints the current date to stdout .

SLURM slow for array job

风格不统一 提交于 2019-12-13 03:57:38
问题 I have a small cluster with nodes A, B, C and D. Each node has 80GB RAM and 32 CPUs. I am using Slurm 17.11.7. I performed the following benchmark tests: If I run a particular Java command directly on terminal on node A, I get an result in 2minutes. If I run the same command with an "single" array job (#SBATCH --array=1-1), I get an result again in 2minutes. If I ran the same command with same parameters with an array job on slurm only on node A, I get the output in 8mininutes, that is, it is

Output log file to cluster option

ぐ巨炮叔叔 提交于 2019-12-13 03:00:53
问题 I'm submitting jobs to slurm/sbatch via snakemake . I'm trying to send the log from sbatch to a file in the same directory tree of the rule's output. For example, this works: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output log.txt" but it fails ( i.e. slurm job status is FAILED) if I try: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output {output}.log" presumably, because {output} points to foo/bar/ which does not exist. But snakemake should have created