I need to see if there are duplicates in an array of strings, what\'s the most time-efficient way of doing it?
Turning the array into a hash is the fastest way [O(n)], though its memory inefficient. Using a for loop is a bit faster than grep, but I'm not sure why.
#!/usr/bin/perl
use strict;
use warnings;
my %count;
my %dups;
for(@array) {
$dups{$_}++ if $count{$_}++;
}
A memory efficient way is to sort the array in place and iterate through it looking for equal and adjacent entries.
# not exactly sort in place, but Perl does a decent job optimizing it
@array = sort @array;
my $last;
my %dups;
for my $entry (@array) {
$dups{$entry}++ if defined $last and $entry eq $last;
$last = $entry;
}
This is nlogn speed, because of the sort, but only needs to store the duplicates rather than a second copy of the data in %count. Worst case memory usage is still O(n) (when everything is duplicated) but if your array is large and there's not a lot of duplicates you'll win.
Theory aside, benchmarking shows the latter starts to lose on large arrays (like over a million) with a high percentage of duplicates.