Need help with peak signal detection in Perl

问题

Hi everyone I have some values of intensities from images of yeast colony plates. I need to be able to find the peak values from the intensity values. Below is an example image showing how the values look when graphed.

Example of some of the values

5.7
5.3
8.2
16.5
34.2
58.8
**75.4**
75
65.9
62.6
58.6
66.4
71.4
53.5
40.5
26.8
14.2
8.6
5.9
7.7
14.9
30.5
49.9
69.1
**75.3**
69.8
58.8
57.2
56.3
67.1
69
45.1
27.6
13.4
8
5

These values show two peaks at 75.4 and 75.3, you can see that the values increase then decrease. The change is not always the same.

Graph of intensity values

http://lh4.ggpht.com/_aEDyS6ECO8s/THKTLgDPhaI/AAAAAAAAAio/HQW7Ut-HBhA/s400/peaks.pngFrom research

One of the things that I am thinking of doing is to store each of the groups i.e. mountains in a hash then look for the largest value in a group. One if the issues that I am seeing though is how to determine the boundaries of each of the groups.

Here is a link to the code that I have so far: http://paste-it.net/public/y485822/

Here is a link to a complete data set: http://paste-it.net/public/ub121b4/

I am writing my code in Perl. Any help would be greatly appreciated. Thank you

回答1:

You need to decide how local you want the peaks to be. The approach here finds peaks and troughs within broad regions of the data.

use strict;
use warnings;

my @data = (
    5.7, 5.3, 8.2, 16.5, 34.2, 58.8, 75.4, 75, 65.9, 62.6,
    58.6, 66.4, 71.4, 53.5, 40.5, 26.8, 14.2, 8.6, 5.9, 7.7,
    14.9, 30.5, 49.9, 69.1, 75.3, 69.8, 58.8, 57.2, 56.3, 67.1,
    69, 45.1, 27.6, 13.4, 8, 5,
);

# Determine mean. Or use Statistics::Descriptive.
my $sum;
$sum += $_ for @data;
my $mean = $sum / @data;

# Make a pass over the data to find contiguous runs of values
# that are either less than or greater than the mean. Also
# keep track of the mins and maxes within those groups.
my $group = -1;
my $gt_mean_prev = '';
my @mins_maxs;
my $i = -1;

for my $d (@data){
    $i ++;
    my $gt_mean = $d > $mean ? 1 : 0;

    unless ($gt_mean eq $gt_mean_prev){
        $gt_mean_prev = $gt_mean;
        $group ++;
        $mins_maxs[$group] = $d;
    }

    if ($gt_mean){
        $mins_maxs[$group] = $d if $d > $mins_maxs[$group];
    }
    else {
        $mins_maxs[$group] = $d if $d < $mins_maxs[$group];
    }

    $d = {
        i       => $i,
        val     => $d,
        group   => $group,
        gt_mean => $gt_mean,
    };
}

# A fun picture.
for my $d (@data){
    printf
        "%6.1f  %2d  %1s  %1d  %3s  %s\n",
        $d->{val},
        $d->{i},
        $d->{gt_mean} ? '+' : '-',
        $d->{group},
        $d->{val} == $mins_maxs[$d->{group}] ? '==>' : '',
        '.' x ($d->{val} / 2),
    ;

}

Output:

   5.7   0  -  0       ..
   5.3   1  -  0  ==>  ..
   8.2   2  -  0       ....
  16.5   3  -  0       ........
  34.2   4  -  0       .................
  58.8   5  +  1       .............................
  75.4   6  +  1  ==>  .....................................
  75.0   7  +  1       .....................................
  65.9   8  +  1       ................................
  62.6   9  +  1       ...............................
  58.6  10  +  1       .............................
  66.4  11  +  1       .................................
  71.4  12  +  1       ...................................
  53.5  13  +  1       ..........................
  40.5  14  -  2       ....................
  26.8  15  -  2       .............
  14.2  16  -  2       .......
   8.6  17  -  2       ....
   5.9  18  -  2  ==>  ..
   7.7  19  -  2       ...
  14.9  20  -  2       .......
  30.5  21  -  2       ...............
  49.9  22  +  3       ........................
  69.1  23  +  3       ..................................
  75.3  24  +  3  ==>  .....................................
  69.8  25  +  3       ..................................
  58.8  26  +  3       .............................
  57.2  27  +  3       ............................
  56.3  28  +  3       ............................
  67.1  29  +  3       .................................
  69.0  30  +  3       ..................................
  45.1  31  +  3       ......................
  27.6  32  -  4       .............
  13.4  33  -  4       ......
   8.0  34  -  4       ....
   5.0  35  -  4  ==>  ..

回答2:

my @data = ...;

# filter out sequential duplicate values
my @orig_index = 0;
my @deduped = $data[0];
for my $index ( 1..$#data ) {
    if ( $data[$index] != $data[$index-1] ) {
        push @deduped, $data[$index];
        push @orig_index, $index;
    }
}

# add a sentinel (works for both ends)
push @deduped, -9**9**9;

my @local_maxima_indexes;
for my $index ( 0..$#deduped-1 ) {
    if ( $deduped[$index] > $deduped[$index-1] && $deduped[$index] > $deduped[$index+1] ) {
        push @local_maxima_indexes, $orig_index[$index];
    }
}

Note that this considers the first value a local maximum, and also the values 71.4 and 69. I'm not sure how you are distinguishing which ones you want included.

回答3:

Do you have a control data set? If so, I'd recommend normalizing your data using say, a simple log ratio between yeast intensities and control images.

You could then use the perl port of ChiPOTle to grab the significant peaks, which sounds way more robust than searching local/global maxima, etc.

ChiPOTle "is a peak-finding algorithm used to analyze ChIP-chip microarray data", but I've used it successfully in many other applications (like ChIP-seq, which admittedly is closer to its original purpose than in your case).

The resulting log(yeast/control) negative values would be used to build a Gaussian background model for significance estimation. The algorithm then uses the false-discovery rate for multiple testing correction.

Here's the original paper.

来源：https://stackoverflow.com/questions/3549205/need-help-with-peak-signal-detection-in-perl

标签

perl

signal-processing

bioinformatics