Fast alternative to grep -f

前端 未结 8 1272
春和景丽
春和景丽 2020-12-11 16:00

file.contain.query.txt

ENST001

ENST002

ENST003

file.to.search.in.txt

ENST001  90

ENST002  80

ENST004  50
相关标签:
8条回答
  • 2020-12-11 17:00

    If you have fixed strings, use grep -F -f. This is significantly faster than regex search.

    0 讨论(0)
  • 2020-12-11 17:01

    If you are using perl version 5.10 or newer, you can join the 'query' terms into a regular expression with the query terms separated by the 'pipe'. (Like:ENST001|ENST002|ENST003) Perl builds a 'trie' which, like a hash, does lookups in constant time. It should run as fast as the solution using a lookup hash. Just to show another way to do this.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Inline::Files;
    
    my $query = join "|", map {chomp; $_} <QUERY>;
    
    while (<RAW>) {
        print if /^(?:$query)\s/;
    }
    
    __QUERY__
    ENST001
    ENST002
    ENST003
    __RAW__
    ENST001  90
    ENST002  80
    ENST004  50
    
    0 讨论(0)
提交回复
热议问题