sungridengine | 易学教程

Running a binary without a top level script in SLURM

阅读更多关于 Running a binary without a top level script in SLURM

问题 In SGE/PBS, I can submit binary executables to the cluster just like I would locally. For example: qsub -b y -cwd echo hello would submit a job named echo, which writes the word "hello" to its output file. How can I submit a similar job to SLURM. It expects the file to have a hash-bang interpreter on the first line. On SLURM I get $ sbatch echo hello sbatch: error: This does not look like a batch script. The first sbatch: error: line must start with #! followed by the path to an interpreter.

Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

阅读更多关于 Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

问题 When qsub ing jobs on a StarCluster / SGE cluster, is there an easy way to ensure that each node receives at most one job at a time? I am having issues where multiple jobs end up on the same node leading to out of memory (OOM) issues. I tried using -l cpu=8 but I think that does not check the number of USED cores just the number of cores on the box itself. I also tried -l slots=8 but then I get: Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly.

Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

阅读更多关于 Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

阅读更多关于 Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

Spreading a job over different nodes of a cluster in sun grid engine (SGE)

阅读更多关于 Spreading a job over different nodes of a cluster in sun grid engine (SGE)

问题 I'm tryin get sun gridending (sge) to run the separate processes of an MPI job over all of the nodes of my cluster. What is happening is that each node has 12 processors, so SGE is assigning 12 of my 60 processes to 5 separate nodes. I'd like it to assign 2 processes to each of the 30 nodes available, because with 12 processes (dna sequence alignments) running on each node, the nodes are running out of memory. So I'm wondering if it's possible to explicitly get SGE to assign the processes to

How to limit the number of jobs on a host using Sungrid?

阅读更多关于 How to limit the number of jobs on a host using Sungrid?

问题 I am using Sungrid6.2u5 ,I am trying to submit some jobs on 4 hosts, I need to run 50 jobs using all the 4 hosts but I want to inform the SGE that I want only 5 jobs to be run on the 4th host at any given time,how do I do that? I am new to SunGrid.Could any one please point me to the SGE basics,I mean where do I get started? I found this online, Beginner's Guide to Sun Grid Engine 6.2 by Daniel Templeton but apparently this is intended for system administrators ,I am just a normal user who is

Forcing SGE to use multiple servers

阅读更多关于 Forcing SGE to use multiple servers

问题 TL;DR: Is there any way to get SGE to round-robin between servers when scheduling jobs, instead of allocating all jobs to the same server whenever it can? Details: I have a large compute process that consists of many smaller jobs. I'm using SGE to distribute the work across multiple servers in a cluster. The process requires a varying number of tasks at different points in time (technically, it is a DAG of jobs). Sometimes the number of parallel jobs is very large (~1 per CPU in the cluster),

what is 'Gbytes seconds'?

阅读更多关于 what is 'Gbytes seconds'?

问题 From the qstat (Sun Grid Engine) manpage: mem: The current accumulated memory usage of the job in Gbytes seconds. What does that mean? 回答1: I couldn't find better documentation than the man page where that description can be found. I think 1 Gbyte second is 1 Gbyte of memory used for one second. So if your code uses 1 GB for 1 minute then 2 GB for two minutes, the accumulated memory usage is 1*60 + 2*120 = 300 GByte seconds. 回答2: The Gigabyte-second unit specifies the amount of memory

loading library on cluster

阅读更多关于 loading library on cluster

问题 I successfully compiled a program in c++, with boost, on a cluster we have here. I need to run an SGE script to run the simulation. The error I get is this ./main: error while loading shared libraries: libboost_thread.so.1.45.0: cannot open shared object file: No such file or directory Do I need to specify the name of the library when I launch the program? The script I used is below #!/bin/sh # (c) 2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. # This is a

Using Docker on Grid Engine / Sun Grid Engine / Son of Grid Engine

阅读更多关于 Using Docker on Grid Engine / Sun Grid Engine / Son of Grid Engine

问题 Does anyone have experience running Docker on Grid Engine / Sun Grid Engine / Son of Grid Engine and being able to monitor the resource used by the daemon? The issue is that when I qsub docker run ... , the actual process in the container is run by the docker daemon rather than the docker client which means the process trees are different. Is there any way for SGE to track the resources of a process in a different tree (I would assume not)? Another option would be to qsub a script that first