To maximize CPU usage (I run things on a Debian Lenny in EC2) I have a simple script to launch jobs in parallel:
#!/bin/bash
for i in apache-200901*.log; do
This is my crude solution:
function run_task {
cmd=$1
output=$2
concurency=$3
if [ -f ${output}.done ]; then
# experiment already run
echo "Command already run: $cmd. Found output $output"
return
fi
count=`jobs -p | wc -l`
echo "New active task #$count: $cmd > $output"
$cmd > $output && touch $output.done &
stop=$(($count >= $concurency))
while [ $stop -eq 1 ]; do
echo "Waiting for $count worker threads..."
sleep 1
count=`jobs -p | wc -l`
stop=$(($count > $concurency))
done
}
The idea is to use "jobs" to see how many children are active in the background and wait till this number drops (a child exits). Once a child exists, the next task can be started.
As you can see, there is also a bit of extra logic to avoid running the same experiments/commands multiple times. It does the job for me.. However, this logic could be either skipped or further improved (e.g., check for file creation timestamps, input parameters, etc.).