Obtain patterns in one file from another using ack or awk or better way than grep?

后端 未结 5 1579
不知归路
不知归路 2020-12-06 07:16

Is there a way to obtain patterns in one file (a list of patterns) from another file using ack as the -f option in grep? I see there i

5条回答
  •  旧巷少年郎
    2020-12-06 08:09

    Here's a Perl one-liner that uses a hash to hold the set of wanted keys from file1 for O(1) (amortized time) lookups per iteration over the lines of file2. So it will run in O(m+n) time, where m is number of lines in your key set, and n is the number of lines in the file you're testing.

    perl -ne'BEGIN{open K,shift@ARGV;chomp(@a=);@hash{@a}=()}m/^(\p{alpha}+)\s/&&exists$hash{$1}&&print' tkeys file2

    The key set will be held in memory while file2 is tested line by line against the keys.

    Here's the same thing using Perl's -a command line option:

    perl -ane'BEGIN{open G,shift@ARGV;chomp(@a=);@h{@a}=();}exists$h{$F[0]}&&print' tkeys file2

    The second version is probably a little easier on the eyes. ;)

    One thing you have to remember here is that it's more likely that you're IO bound than processor bound. So the goal should be to minimize IO use. When the entire lookup key set is held in a hash that offers O(1) amortized lookups. The advantage this solution may have over other solutions is that some (slower) solutions will have to run through your key file (file1) one time for each line of file2. That sort of solution will be O(m*n) where m is the size of your key file, and n is the size of file2. On the other hand, this hash approach provides O(m+n) time. That's a magnitude of difference. It benefits by eliminating linear searches through the key-set, and further benefits by reading the keys via IO only one time.

提交回复
热议问题