Fast alternative to grep -f

前端未结

关注

 8  1277

春和景丽 2020-12-11 16:00

file.contain.query.txt

ENST001

ENST002

ENST003

file.to.search.in.txt

ENST001  90

ENST002  80

ENST004  50

8条回答

不思量自难忘° (楼主)

2020-12-11 16:40

If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:

#!/usr/bin/env perl
use strict;
use warnings;

# build hash table of keys
my $keyring;
open KEYS, "< file.contain.query.txt";
while () {
    chomp $_;
    $keyring->{$_} = 1;
}
close KEYS;

# look up key from each line of standard input
while () {
    chomp $_;
    my ($key, $value) = split("\t", $_); # assuming search file is tab-delimited; replace delimiter as needed
    if (defined $keyring->{$key}) { print "$_\n"; }
}

You'd use it like so:

lookup.pl < file.to.search.txt

A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.

0 讨论(0)

查看其它8个回答