Merging and sorting log files in Python

前端 未结 5 1043
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-02 02:58

I am completely new to python and I have a serious problem which I cannot solve.

I have a few log files with identical structure:

[timestamp] [level]         


        
相关标签:
5条回答
  • 2021-01-02 03:18

    All of the other answers here read in all the logs before the first line is printed, which can be incredibly slow, and even break things if the logs are too big.

    This solution uses a regex and a strptime format, like the above solutions, but it "merges" the logs as it goes.

    That means you can pipe the output of the to "head" or "less", and expect it to be snappy.

    import typing
    import time
    from dataclasses import dataclass
    
    
    t_fmt = "%Y%m%d.%H%M%S.%f"      # format of time stamps
    t_pat = re.compile(r"([^ ]+)")  # pattern to extract timestamp
    
    def get_time(line, prev_t):
        # uses the prev time if the time isn't found
        res = t_pat.search(line)
        if not res:
            return prev_t
        try:
            cur = time.strptime(res.group(1), t_fmt)
        except ValueError:
            return prev_t   
        return cur
    
    def print_sorted(files):
        @dataclass
        class FInfo:
            path: str
            fh: typing.TextIO
            cur_l = ""
            cur_t = None
    
            def __read(self):
                self.cur_l += self.fh.readline()
                if not self.cur_l:
                    # eof found, set time so file is sorted last
                    self.cur_t = time.localtime(time.time() + 86400)
                else:
                    self.cur_t = get_time(self.cur_l, self.cur_t)
    
            def read(self):
                # clear out the current line, and read
                self.cur_l = ""
                self.__read()
                while self.cur_t is None:
                    self.__read()
    
        finfos = []
        for f in files:
            try:
                fh = open(f, "r")
            except FileNotFoundError:
                continue
            fi = FInfo(f, fh)
            fi.read()
            finfos.append(fi)
    
        while True:
            # get file with first log entry
            fi = sorted(finfos, key=lambda x: x.cur_t)[0]
            if not fi.cur_l:
                break
            print(fi.cur_l, end="")
            fi.read()
    
    
    0 讨论(0)
  • 2021-01-02 03:25

    As for the critical sorting function:

    def sort_key(line):
        return datetime.strptime(line.split(']')[0], '[%a %b %d %H:%M:%S %Y')
    

    This should be used as the key argument to sort or sorted, not as cmp. It is faster this way.

    Oh, and you should have

    from datetime import datetime
    

    in your code to make this work.

    0 讨论(0)
  • 2021-01-02 03:29

    You can do this

    import fileinput
    import re
    from time import strptime
    
    f_names = ['1.log', '2.log'] # names of log files
    lines = list(fileinput.input(f_names))
    t_fmt = '%a %b %d %H:%M:%S %Y' # format of time stamps
    t_pat = re.compile(r'\[(.+?)\]') # pattern to extract timestamp
    for l in sorted(lines, key=lambda l: strptime(t_pat.search(l).group(1), t_fmt)):
        print l,
    
    0 讨论(0)
  • 2021-01-02 03:30

    First off, you will want to use the fileinput module for getting data from multiple files, like:

    data = fileinput.FileInput()
    for line in data.readlines():
        print line
    

    Which will then print all of the lines together. You also want to sort, which you can do with the sorted keyword.

    Assuming your lines had started with [2011-07-20 19:20:12], you're golden, as that format doesn't need any sorting above and beyond alphanum, so do:

    data = fileinput.FileInput()
    for line in sorted(data.readlines()):
        print line
    

    As, however, you have something more complex you need to do:

    def compareDates(line1, line2):
       # parse the date here into datetime objects
       NotImplemented
       # Then use those for the sorting
       return cmp(parseddate1, parseddate2)
    
    data = fileinput.FileInput()
    for line in sorted(data.readlines(), cmp=compareDates):
        print line
    

    For bonus points, you can even do

    data = fileinput.FileInput(openhook=fileinput.hook_compressed)
    

    which will enable you to read in gzipped log files.

    The usage would then be:

    $ python yourscript.py access.log.1 access.log.*.gz
    

    or similar.

    0 讨论(0)
  • 2021-01-02 03:39

    Read the lines of both files into a list (they will now be merged), provide a user defined compare function which converts timestamp to seconds since epoch, call sort with the user defined compare, write lines to merged file...

    def compare_func():
        # comparison code
        pass
    
    
    lst = []
    
    for line in open("file_1.log", "r"):
       lst.append(line)
    
    for line in open("file_2.log", "r"):
       lst.append(line)
    
    # create compare function from timestamp to epoch called compare_func
    
    lst.sort(cmp=compare_func)  # this could be a lambda if it is simple enough
    

    something like that should do it

    0 讨论(0)
提交回复
热议问题