发表新帖

发表新帖

What's the most efficient way to check for duplicates in an array of data using Perl?

前端未结

关注

 7  1599

花落未央 2020-12-05 14:26

I need to see if there are duplicates in an array of strings, what\'s the most time-efficient way of doing it?

7条回答

夕颜 (楼主)

2020-12-05 15:15
Turning the array into a hash is the fastest way [O(n)], though its memory inefficient. Using a for loop is a bit faster than grep, but I'm not sure why.
```
#!/usr/bin/perl

use strict;
use warnings;

my %count;
my %dups;
for(@array) {
    $dups{$_}++ if $count{$_}++;
}
```
A memory efficient way is to sort the array in place and iterate through it looking for equal and adjacent entries.
```
# not exactly sort in place, but Perl does a decent job optimizing it
@array = sort @array;

my $last;
my %dups;
for my $entry (@array) {
    $dups{$entry}++ if defined $last and $entry eq $last;
    $last = $entry;
}
```
This is nlogn speed, because of the sort, but only needs to store the duplicates rather than a second copy of the data in %count. Worst case memory usage is still O(n) (when everything is duplicated) but if your array is large and there's not a lot of duplicates you'll win.

Theory aside, benchmarking shows the latter starts to lose on large arrays (like over a million) with a high percentage of duplicates.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题