问题
I have a Perl program that I'm managing on that has the ability to fork off multiple processes (up to a specified limit), monitor them, and as they exit, fork off additional processes (once again, up to the limit), until the list of things to run is completed. It works fine, except for some reason it doesn't appear to be picking up the correct exit status from my child processes.
The code that doesn't work uses Perl's fork(), waitpid(), and the child processes use POSIX::_exit() to quit. Here's some excerpts of the relevant code:
Forking code:
# Initialize process if running in parallel mode
my $pid;
if ($options{'parallel'} > 0) {
log_status("Waiting to fork test #".$curr_test{'id'}."...\n");
# Here, wait for child processes to complete so we can fork off new ones without going over the specified limit
while ( keys(%children) >= $options{'parallel'}) {
my $kid = waitpid(-1, 0);
my $kid_status = $?;
if ($kid > 0) {
log_status("Child process (PID ".$kid.", test ".$children{$kid}.") exited with status ".$kid_status.".\n");
$error_status |= $kid_status;
delete $children{$kid};
}
}
$pid = fork();
tdie("Unable to fork!\n") unless defined $pid;
if ($pid != 0) {
# I'm the parent
$is_child = 0;
log_status("Forked child process (PID ".$pid.").\n");
$children{$pid} = $curr_test{'logstr'};
next TEST_LOOP;
}
else {
# I'm the child
$is_child = 1;
log_status("Starting test = ".$curr_test{'logstr'}."\n");
}
}
Exit child process code:
### finish_child() ###
# Handles exiting the script, like the finish() function, but only when running as a child process in parallel mode.
# Parameters:
# - The error code to exit with
###
sub finish_child( $ ) {
my ($error_status) = @_;
# If running in parallel mode, exit this fork
if ($options{'parallel'} > 0) {
log_status("Entering: ".Cwd::abs_path("..")."\n");
chdir "..";
log_status("Exiting with status: ".$error_status."\n");
POSIX::_exit($error_status);
}
}
Here's where finish_child() is called in my example run:
# If build failed, log status and gracefully clean up logfiles, then continue to next test in list.
if ($test_status > 0) {
$email_subject = "Build failed!";
log_status("Build of ".$testline." FAILED.\n");
tlog(1, "Build of ".$testline." FAILED.\n");
log_status("Entering: ".Cwd::abs_path("..")."\n");
chdir "..";
log_report(\%curr_test, $test_status);
# Print out pass/fail status for each test as it completes
$quietmode = $options{'quiet'}; # Backup quiet mode setting
$options{'quiet'} = 0;
if ($test_status == 0) {
log_status("Test ".$testline." PASSED.\n");
tlog(0, "Test ".$testline." PASSED.\n");
}
else {
log_status("Test ".$testline." FAILED.\n");
tlog(1, "Test ".$testline." FAILED.\n");
}
$options{'quiet'} = $quietmode; # Restore quiet mode setting
finish_logs();
# Link logs to global area and rename if running multiple tests
system("ln -sf ".$root_dir."/verify/".$curr_test{'id'}."/".$verify::logfile." ../".(($test_status > 0) ? "fail".$curr_test{'id'}.".log" : "pass".$curr_test{'id'}.".log" )) if (@tests > 1);
if ($options{'parallel'} > 0 && $pid == 0) {
# If we're in parallel mode and I'm a child process, I should exit, instead of continuing to loop.
finish_child($test_status);
}
else {
# If we're not in parallel mode, I should continue to loop.
next TEST_LOOP;
}
}
Here's the behavior that I'm seeing according to the log from a run I did:
<Parent> Waiting for all child processes to complete...
<Child> [PID 28657] Entering: <trimmed>
<Child> [PID 28657] Running user command: make --directory <trimmed> TARGET=build BUILD_DIR=<trimmed> RUN_DIR=<trimmed>
<Child> [PID 28657] User command finished with return code: 512
<Child> [PID 28657] Build step finished with return code 512
<Child> [PID 28657] Entering: <trimmed>
<Child> [PID 28657] Build of rx::basic(1) FAILED.
<Child> [PID 28657] Entering: <trimmed>
<Child> [PID 28657] Test rx::basic(1) FAILED.
<Child> [PID 28657] Closing log file.
<Child> [PID 28657] Closing error log file.
<Child> [PID 28657] Entering: <trimmed>
<Parent> Child process (PID 28657, test rx::basic(1)) exited with status 0.
I have code that uses Perl IPCs to run commands (in lieu of the system() call, for more flexibility that picks up the exit code properly, which you can see in the "User command" lines from the log file.
What could I be doing wrong, here? Why wouldn't I be able to pick up the exit status from $? in this case? The examples I found online all seem to indicate that this should work fine.
For reference, I'm running Perl v5.10.1. This Perl tool is also open sourced on GitHub if you feel you need to look through the rest of the code: https://github.com/benrichards86/Verify/blob/master/verify.pl
回答1:
If $test_status is 512, are you calling POSIX::_exit(512)? That is incorrect.
A child process should call POSIX::_exit with an operand in the range 0 to 255, and the Perl parent process that reaps that child will get $? set to exit-status << 8.
POSIX::_exit(512) is equivalent to POSIX::_exit(512 % 256), or POSIX::_exit(0).
回答2:
It seems you are doing what amounts to the following:
exit($?)
You mean to propagate the value the child passed to exit, but that's not what $? contains.
If the child was killed by a signal, $? & 0x7F contains the number of the signal that killed the process.
If the child wasn't killed by a signal, $? & 0x7F is zero, and $? >> 8 contains a the value the process passed to exit.
So when the child does exit(1), you do exit(256), and that's out of range on Unix systems. The high bits are chopped off leaving you with zero (256 & 0xFF = 0).
I suggest that you do what bash does:
exit( ($? & 0x7F) ? ($? | 0x80) : ($? >> 8) );
When the child does exit(1), this does exit(1).
When the child is killed by, say, SIGTERM (15), this does exit(128 + 15).
回答3:
Yes, that may be the explanation but what intrigues me is that your test output doesn't show the exit status that the child actually uses. There's a log message in the code ("Exiting with status:...") but no corresponding line in the output.
So we can't really tell if anything's going wrong in this part of your code.
I first thought the use of POSIX::_exit might explain the logging problem (it would prevent final buffers from being flushed), but looking at your code again I see that you've turned logging off before calling finish_child.
I'd recommend as a first step that you get the logging working right so that you can tell where the problem is. Why not move the log close and logfile renaming logic into the finish child routine as the last thing done before exiting?
As for the exit status problem, I see three possible explanations, all in the code for the child process:
- the child isn't actually exiting through function finish_child
- the non-zero status you think is being passed to finish_child and then to exit actually isn't being passed
- as suggested above, your exit status is > 255
Is the any particular reason why you're using POSIX::_exit() instead of exit() and waitpid(-1) instead of wait()?
来源:https://stackoverflow.com/questions/18640737/why-arent-i-picking-up-the-exit-status-from-my-child-process