Fast alternative to grep -f

前端 未结 8 1277
春和景丽
春和景丽 2020-12-11 16:00

file.contain.query.txt

ENST001

ENST002

ENST003

file.to.search.in.txt

ENST001  90

ENST002  80

ENST004  50
8条回答
  •  不思量自难忘°
    2020-12-11 16:40

    If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    # build hash table of keys
    my $keyring;
    open KEYS, "< file.contain.query.txt";
    while () {
        chomp $_;
        $keyring->{$_} = 1;
    }
    close KEYS;
    
    # look up key from each line of standard input
    while () {
        chomp $_;
        my ($key, $value) = split("\t", $_); # assuming search file is tab-delimited; replace delimiter as needed
        if (defined $keyring->{$key}) { print "$_\n"; }
    }
    

    You'd use it like so:

    lookup.pl < file.to.search.txt
    

    A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.

提交回复
热议问题