Netlogo HPC CPU Percentage Use Increase

问题

I submit jobs using headless NetLogo to a HPC server by the following code:

#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
    --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
    --experiment test \
    --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv

Below is the snapshot of a cluster queue using:

qstat -g c

I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.

CPU USAGE CHECK:

qhost -u abhishekb

When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.

EDIT:

EDIT 2;

回答1:

A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.

It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).

Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.

回答2:

1. CPU percentage load

I agree with @rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's

almost certain not to work

and, (as you've found)

you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)

The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.

Here are some answers to the more specific parts of your question:

2. CQLOAD

You ask

CQLOAD (what does it mean too?)

The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)

If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:

OUTPUT FORMATS

...

an average of the normalized load average of all queue hosts. In order to reflect each hosts different signifi- cance the number of configured slots is used as a weight- ing factor when determining cluster queue load. Please note that only hosts with a np_load_value are considered for this value. When queue selection is applied only data about selected queues is considered in this formula. If the load value is not available at any of the hosts '- NA-' is printed instead of the value from the complex attribute definition.

This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.

3. Number of nodes reserved

In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.

You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use

N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)

回答3:

So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?

In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).

So if you want to make a process run with higher priority then from within bash do

[abhishekb@hpc ~]$ start_process &
[abhishekb@hpc ~]$ jobs -x sudo renice -n -20 -p %+

Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.

来源：https://stackoverflow.com/questions/28628527/netlogo-hpc-cpu-percentage-use-increase

标签

Linux

cloud

netlogo

hpc