问题
I have a directory that contains other directories (the number of directories is arbitrary), like this:
Main_directory_samples/
- subdirectory_sample_1/
- subdirectory_sample_2/
- subdirectory_sample_3/
- subdirectory_sample_4/
I have a script that receives as input one directory each time and it takes 1h to run (for each directory). To run the script I have the following code:
opendir DIR, $maindirectory or die "Can't open directory!!";
while(my $dir = readdir DIR){
if($dir ne '.' && $dir ne '..'){
system("/bin/bash", "my_script.sh", $maindirectory.'/'.$dir);
}
}
closedir DIR;
However, I want to run the script for different directories at the same time. For instance, the subdirectory_sample_1/ and subdirectory_sample_2/ would run in the same thread; subdirectory_sample_3/ and subdirectory_sample_4/ in another. But I just can't find a way to do this.
回答1:
As you're just starting external processes and waiting for them, a non-threading option:
use strict;
use warnings;
use Path::Tiny;
use IO::Async::Loop;
use Future::Utils 'fmap_concat';
my $loop = IO::Async::Loop->new;
my $maindirectory = '/foo/bar';
my @subdirs = grep { -d } path($maindirectory)->children; # excludes . and ..
# runs this code to maintain up to 'concurrent' pending futures at once
my $main_future = fmap_concat {
my $dir = shift;
my $future = $loop->new_future;
my $process = $loop->open_process(
command => ['/bin/bash', 'my_script.sh', $dir],
on_finish => sub { $future->done(@_) },
on_exception => sub { $future->fail(@_) },
);
return $future;
} foreach => \@subdirs, concurrent => 2;
# run event loop until all futures are done or one fails, throw exception on failure
my @exit_codes = $main_future->get;
See the docs for IO::Async::Loop and Future::Utils.
回答2:
One way is to fork and in each child process a group of directories.
A basic example
use warnings;
use strict;
use feature 'say';
use List::MoreUtils qw(natatime);
use POSIX qw(:sys_wait_h); # for WNOHANG
use Time::HiRes qw(sleep); # for fractional seconds
my @all_dirs = qw(d1 d2 d3 d4);
my $path = 'maindir';
my @procs;
# Get iterator over groups (of 2)
my $it = natatime 2, @all_dirs;
while (my @dirs = $it->()) {
my $pid = fork // do { #/
warn "Can't fork for @dirs: $!";
next;
};
if ($pid == 0) {
foreach my $dir (@dirs) {
my @cmd = ('/bin/bash/', 'my_script.sh', "$path/$dir");
say "in $$, \@cmd: (@cmd)";
# system(@cmd) == 0 or do { inspect $? }
};
exit;
};
push @procs, $pid;
}
# Poll with non-blocking wait for processes (reap them)
my $gone;
while (($gone = waitpid -1, WNOHANG) > -1) {
my $status = $?;
say "Process $gone exited with $status" if $gone > 0;
sleep 0.1;
}
See system and/or exec for details, in particular on error checking, as well as $? variable. It can be unpacked to retrieve more details about the error; or, at least print a warning and skip to the next item (which happens above anyway).
The code above prints out the command and pid's with their exit status, but replace @cmd with a test command of no consequence and un-comment the system line to try this out.
Watch for how many jobs there are. A basic rule of thumb is to not have more than 2 per core at which point the performance starts suffering, but this depends on many details. Experiment to find the sweet spot for your case. I like to have a job per core and then at least one core free. In order to throttle this see modules linked at the end.
To break all jobs (directories) into groups I used natatime from List::MoreUtils (n-at-a-time). If there are more specific criteria about how to group directories adjust that.
See Forks::Super and Parallel::ForkManager for higher-level ways to work with forked processes.
来源:https://stackoverflow.com/questions/52612742/perl-run-the-same-script-for-different-directories-at-the-same-time