Grep for a word, and if found print 10 lines before and 10 lines after the pattern match

后端 未结 5 612
温柔的废话
温柔的废话 2021-01-01 05:32

I am processing a huge file. I want to search for a word in the line and when found I should print 10 lines before and 10 lines after the pattern match. How can I do it in P

相关标签:
5条回答
  • 2021-01-01 06:11

    Use grep with -C option, easiest solution:

    grep -C 10 'what_to_search' file.txt
    
    0 讨论(0)
  • 2021-01-01 06:18

    How about some short code like this in python to do context grepping:

    $ cat file2
    abcd
    xyz
    print this 1
    print this 2
    line having pattern
    print this 1
    print this 2
    abcd
    fgg
    $ cat p.py 
    import re
    num_lines_cnt=2
    lines=open('file2').readlines()
    print([lines[i-num_lines_cnt:i+num_lines_cnt+1] for i in range(len(lines)) if re.search('pattern', lines[i]) is not None])
    $ python3 p.py 
    [['print this 1\n', 'print this 2\n', 'line having pattern\n', 'print this 1\n', 'print this 2\n']]
    $
    
    0 讨论(0)
  • 2021-01-01 06:25

    Without importing any package, we can achieve this.

    string_to_search=input("Enter the String: ")
    before=int(input("How many lines to print before string match ? "))
    after=int(input("How many lines to print after string match ? "))
    file_to_search=input("Enter the file to search: ")
    
    def search_string(string_to_search, before, after, file_to_search):
        with open(file_to_search) as f:
            all_lines = f.readlines()
            last_line_number=len(all_lines)
            for current_line_no, current_line in enumerate(all_lines):
                if string_to_search in current_line:
                    start_line_no=max(current_line_no - before, 0)
                    end_line_no=min(last_line_number, current_line_no+after+1)
                    for i in range(start_line_no, current_line_no):print(all_lines[i])              
                    for i in range(current_line_no, end_line_no):print(all_lines[i])
                    break
    search_string(string_to_search, before, after, file_to_search)
    

    Explanation:

    string_to_search: word/pattern that you want to grep
    before: number of lines that you want to print before the pattern match
    after: number of lines that you want to print after the pattern match
    my_file.txt is the file which contains the word/pattern/string

    current_lineno will contain the line number which contains the pattern

    Sample File Content:

    $cat my_file.txt
    this is line 1
    this is line 2
    this is line 3
    this is line 4
    this is line 5 my pattern is here
    this is line 6
    this is line 7
    this is line 8
    this is line 9
    this is line 10
    

    Sample Execution and Output:

    $python grep_3.py
    Enter the String: my pattern
    How many lines to print before string match ? 2
    How many lines to print after string match ? 1000
    Enter the file to search: my_file.txt
    this is line 3
    
    this is line 4
    
    this is line 5 my pattern is here
    
    this is line 6
    
    this is line 7
    
    this is line 8
    
    this is line 9
    
    this is line 10
    

    The above code is equivalent to Unix `grep' command

    $ grep -A 2000 -B 2 'my pattern' my_file.txt
    this is line 3
    this is line 4
    this is line 5 my pattern is here
    this is line 6
    this is line 7
    this is line 8
    this is line 9
    this is line 10
    
    0 讨论(0)
  • 2021-01-01 06:30
    import collections
    import itertools
    import sys
    
    with open('huge-file') as f:
        before = collections.deque(maxlen=10)
        for line in f:
            if 'word' in line:
                sys.stdout.writelines(before)
                sys.stdout.write(line)
                sys.stdout.writelines(itertools.islice(f, 10))
                break
            before.append(line)
    

    used collections.deque to save up to 10 lines before match, and itertools.islice to get next 10 lines after the match.


    UPDATE To exclude lines with ip/mac address:

    import collections
    import itertools
    import re  # <---
    import sys
    
    addr_pattern = re.compile(
        r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|'
        r'\b[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}\b',
        flags=re.IGNORECASE
    )  # <--
    
    with open('huge-file') as f:
        before = collections.deque(maxlen=10)
        for line in f:
            if addr_pattern.search(line):  # <---
                continue                   # <---
            if 'word' in line:
                sys.stdout.writelines(before)
                sys.stdout.write(line)
                sys.stdout.writelines(itertools.islice(f, 10))
                break
            before.append(line)
    
    0 讨论(0)
  • 2021-01-01 06:31

    Try this

    #!/usr/bin/python
    import commands
    
    filename = "any filename"
    string_to_search = "What you want to search"
    
    extract  = (commands.getstatusoutput("grep -C 10 '%s' %s"%(string_to_search,filename)))[1]
    
    print(extract)
    
    0 讨论(0)
提交回复
热议问题