Use more than one core in bash

☆樱花仙子☆ 提交于 2021-01-27 13:57:17

问题


I have a linux tool that (greatly simplifying) cuts me the sequences specified in illumnaSeq file. I have 32 files to grind. One file is processed in about 5 hours. I have a server on the centos, it has 128 cores.

I've found a few solutions, but each one works in a way that only uses one core. The last one seems to fire 32 nohups, but it'll still pressurize the whole thing with one core.

My question is, does anyone have any idea how to use the server's potential? Because basically every file can be processed independently, there are no relations between them.

This is the current version of the script and I don't know why it only uses one core. I wrote it with the help of advice here on stack and found on the Internet:

#!/bin/bash
FILES=/home/daw/raw/*
count=0

for f in $FILES
to
  base=${f##*/}
  echo "process $f file..."
  nohup /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o "OUT$base" $f &
  (( count ++ ))
  if (( count = 31 )); then
        wait
        count=0
  fi
done

I'm explaining: FILES is a list of files from the raw folder.

The "core" line to execute nohup: the first path is the path to the tool, -a path is the path to the file with paternas to cut, out saves the same file name as the processed + OUT at the beginning. The last parameter is the input file to be processed.

Here readme tools: https://github.com/vsbuffalo/scythe

Does anybody know how you can handle it?

P.S. I also tried move nohup before count, but it's still use one core. I have no limitation on server.


回答1:


IMHO, the most likely solution is GNU Parallel, so you can run up to say, 64 jobs in parallel something like this:

parallel -j 64 /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o OUT{.} {} ::: /home/daw/raw/*

This has the benefit that jobs are not batched, it keeps 64 running at all times, starting a new one as each job finishes, which is better than waiting potentially 4.9 hours for all 32 of your jobs to finish before starting the last one which takes a further 5 hours after that. Note that I arbitrarily chose 64 jobs here, if you don't specify otherwise, GNU Parallel will run 1 job per CPU core you have.

Useful additional parameters are:

  • parallel --bar ... gives a progress bar
  • parallel --dry-run ... does a dry run so you can see what it would do without actually doing anything

If you have multiple servers available, you can add them in a list and GNU Parallel will distribute the jobs amongst them too:

parallel -S server1,server2,server3 ...


来源:https://stackoverflow.com/questions/60812172/use-more-than-one-core-in-bash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!