Do I have a rounding error? Perl

问题

My script is supposed to do following. It takes an old list of scalars and makes a new, corresponding list of numbers. The old list is referred to as @oldMarkers and the new list as @newMarkers.

Sample input is like: chr1, chr2, IMP, chr3, IMP, IMP, IMP, chr4

Sample output is like: 1, 2, 2.1, 3, 3.1, 3.2, 3.3, 4

The point of the script is to read the list of @oldMarkers and output a list where for each instance of an element containing the letters "chr," an integer is pushed into the array @newMarkers. For each instance of IMP in @oldMarkers, a decimal number is added to @newMarkers. The new decimal number has the same "base integer" as the preceding number but has .1 added to it. In other words, multiple succeeding instances of "IMP" are supposed to have the same whole number as the most recently read "chr" entry, with a decimal value tacked on that counts the number of IMPs that correspond to that most recent "chr" entry.

The script below works almost 100%. It is even usually working in the following instance. In some places in @oldMarkers, there are numerous entries for IMP. When there are more than 10 IMPs in a row, the code is supposed to push values into @newMarkers so that all the "IMP"s of that block of entries have the same whole number, which also matches the number corresponding to the most recently read instance of "chr" in the @oldMarkers. To that whole number, 0.1 is added. And when the value of the decimal gets to .9, the decimals "start over" back to .1 and go up from there, until the end of the stretch of IMP entries.

For example, if @oldMarkers has a block of 13 "IMP"s and is: chr1, chr2, IMP, IMP, IMP, IMP, IMP, IMP, IMP, IMP, IMP, IMP, IMP, IMP, IMP, chr2

then @newMarkers should be: 1, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.1, 2.2, 2.3, 2.4, 3

Summary of the script:

The original file contains multiple lines of two elements. The first element is not important, and so is skipped over in the code. The second element of each line is an ID, either something like "chr4" or "IMP". A while loop reads each line, adds the second element to the array @oldMarkers.

Then, this array is read entry by entry. The script first askes whether the entry in @newMarkers corrsponds to a "chr" or an "IMP" from the original @oldMarker list. This is done with the first if and else set.

Next, for both conditions, the entry is further asked whether it follows from a number itself corresponding to a "chr" or "IMP" entry. This is done with the embedded if and else sets with in the first such set.

Then new elements are defined and are pushed into @newMarker, depending on the conditions.

Like I said, this mostly works. Sometimes, however, when IMP's stretch for more than 10, the script does not "recycle" the decimals. Rather, it adds .1 to the preceeding value and enters a new whole number integer. But for other stretches that exceed 10, it works fine. It is inconsistent with this "error."

Can you spot the problem?

my @oldMarkers = ();
my @newMarkers = ();

while ( my $line = <$FILE> )
    {
    chomp $line;
    my @entries = split( '\t', $line );
    push( @oldMarkers, $entries[ 1 ] ); 
    } ### end of while


for ( my $i = 0 ; $i < scalar @oldMarkers   ; $i++ )
    {  
     if ( $oldMarkers[ $i ] =~ m/chr/ ) ### is a marker
        {
         if ( $oldMarkers[ $i - 1 ] =~ m/IMP/ ) ### new marker comes after imputed site
            {
             push( @newMarkers, int( $newMarkers[ $i - 1 ] ) + 1 );            
            }

       else  ### is coming after a marker                                       
           {
            push( @newMarkers, $newMarkers[ $i - 1 ] + 1 ); 
           }    

      } ### if

   else    ### is an imputed site
      {
       if ( $oldMarkers[ $i - 1 ] =~ m/IMP/ ) ### imputed site is after another imputed site
          {
           my $value = $newMarkers[ $i - 1 ] - int( $newMarkers[ $i - 1 ] );

           if ( $value < .9 )
                {
                 push( @newMarkers, $newMarkers[ $i - 1 ] + .1 );   
                }

          elsif ( $value > .9 )
                {
                 push( @newMarkers, int( $newMarkers[ $i - 1 ] ) + .1  );   
                } 


        } ### if

   else ### imputed site is after a marker
        {
         push( @newMarkers, int( $newMarkers[ $i - 1 ] ) + .1 ); 
        }    

    } ### else   

} ### for    


print $newMarkerfile join( "\t", @newMarkers);

回答1:

It would be easier and more reliable to do this using only integer arithmetic. Basically, keep track of two integer values: one for the number before the . and one for the digit after it. If the digit after the . reaches 10, reset it to 1:

my @newMarkers;
my $chrCount = 0;
my $impCount = 0;

foreach my $marker (@oldMarkers) {
    if ( $marker =~ /^chr\d+$/ ) {
        $chrCount++;
        $impCount = 0;
        push @newMarkers, $chrCount;
    } elsif ( $marker eq "IMP" ) {
        $impCount++;
        $impCount = 1 if $impCount == 10;
        push @newMarkers, "$chrCount.$impCount";
    } else {
        die "Unrecognized marker $marker";
    }
}

(demo on codepad.org)

回答2:

10 × 0.1 = 1, yet

>perl -E"$x=0; $x += 0.1 for 1..10; say sprintf('%0.16f', $x); say int($x);"
0.9999999999999999
0

You should always use some form or rounding or tolerance when dealing with floats.

Too many numbers are periodic in binary. You know how 1/3 is periodic in decimal? Well, 1/10 is periodic in binary. And so are 2/10, 3/10, 4/10, 6/10, 7/10, 8/10 and 9/10. None of these numbers can be represented without error by floats.

回答3:

Seems to be working right:

$imp_order = 0;
$chr_order = 0;
for my $old (@oldMarkers) {   
  if ( $old =~ m/chr/ ) ### is a marker
  {

    $imp_order = 0;
    $chr_order++;

    push( @newMarkers,  $chr_order );    

  } ### if

  else    ### is an imputed site
  {
      $imp_order = 0 if $imp_order == 9;
      $imp_order++;
      push( @newMarkers, $chr_order + $imp_order / 10 );   

  } ### else   

} ### for

回答4:

As ikegami suggests, those int() calls are definitely causing your rounding issues. You could use POSIX and then use ceil() or floor() as appropriate to fix the problem.

See the docs here: http://perldoc.perl.org/perlfaq4.html#Does-Perl-have-a-round%28%29-function%3F-What-about-ceil%28%29-and-floor%28%29%3F-Trig-functions%3F

For example, I think the exact error you are describing could be fixed by replacing:

elsif ( $value > .9 )
    {
        push( @newMarkers, int( $newMarkers[ $i - 1 ] ) + .1  );   
    }

with:

elsif ( $value > .9 )
    {
        push( @newMarkers, ceil( $newMarkers[ $i - 1 ] ) + .1  );   
    }

You should probably replace all of those int() calls with the appropriate rounding function for each case.

Follow-up: I actually prefer the multiple solutions suggested that track "chr" count/order and "imp" count/order separately, rather than as a single float. But I'll leave this here as I think it is instructive to the poster regarding how to implement a solution with rounding.

回答5:

If I understand you correctly then this is all that is necessary.

use strict;
use warnings;

my @old = do {
  open my $fh, '<', 'markers.txt' or die $!;
  map /([^\t]+)$/, <$fh>;
};

my @new;
my @marker;
my $chr = 0;

for (@old) {
  if ( /chr/ ) {
    @marker = (++$chr);
  }
  elsif ( @marker > 1 and $marker[1] == 9 ) {
    $marker[1] = 1;
  }
  else {
    $marker[1]++;
  }
  push @new, [@marker];
}

@new = map join('.', @$_), @new;

print join(', ', @new), "\n";

output

1, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.1, 2.2, 2.3, 2.4, 3

回答6:

if in your 2nd example the output should be : 1 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 4

instead of > use >=

Then you have 2 options: int( $newMarkers[ $i - 1 ] ) + $value + .100000 or add 1 to the int value of the newMarkers[$i - 1]

来源：https://stackoverflow.com/questions/14204125/do-i-have-a-rounding-error-perl

标签

arrays

perl

loops

rounding

nested-loops