Restructuring pyparsing parse results of multithreaded log file

问题

I have a log file of a multithreaded process which looks like this:

<timestamp_in> <first_function_call_input> <thread:1>
    input_parameter_1:     value
    input_parameter_2:     value

<timestamp_in> <another_function_call_input> <thread:2>
    input_parameters:      values
<timestamp_out> <another_function_call_output> <thread:2>
    output_parameters:     values

<timestamp_out> <first_function_call_output> <thread:1>
    output_parameters:     values

In my parse results variable I would like to have the input and output information of one function call paired together, for example like this:

>>> print(parse_results.dump())
  -[0]:
       -function: first_function
       -thread: 1
       -timestamp_in: ...
       -timestamp_out: ...
       -input_parameters:
             [0]:
                  -parameter_name: input_parameter_1
                  -parameter_value: value
             [1]:
                  -parameter_name: input_parameter_2
                  -parameter_value: value
       -output_parameters:
             [0]: ...
             ...
  -[1]:
       -function: another_function
       -thread: 2
       ...

Is there a way to restructure the parse_results directly while parsing, so I don't have to restructure the results afterwards? Maybe with some parse actions? Or would it be way easier to just parse the input-parts and the output-parts by themselves, then sort them by thread, timestamp, and function and stitch the input-parts and output-parts together in a new object?

Thanks for your help!

Edit:
I'm going to go do the sorting of the input-parts and output-parts after parsing them seperately, that seems way easier. However, I am still wondering if and how it is possible to restructure a parse results instance. Say I have the following grammar and test string:

from pyparsing import *

ParserElement.inlineLiteralsUsing(Suppress)
key_val_lines = OneOrMore(Group(Word(alphas)('key') + ':' + Word(nums)('val')))('parameters')

special_key_val_lines = OneOrMore(Group(Word(printables)('key') + ':' + Word(alphas)('val')))('special_parameters')

log = OneOrMore(Group(key_val_lines | special_key_val_lines))('contents').setDebug()

test_string ='''
foo             : 1
bar             : 2
special_key1!   : wow
another_special : abc
normalAgain     : 3'''

parse_results = log.parseString(test_string).dump()
print(parse_results)

This outputs the following:

- contents: [[['foo', '1'], ['bar', '2']], [['special_key1!', 'wow'], ['another_special', 'abc']], [['normalAgain', '3']]]
  [0]:
    [['foo', '1'], ['bar', '2']]
    - parameters: [['foo', '1'], ['bar', '2']]
      [0]:
        ['foo', '1']
        - key: 'foo'
        - val: '1'
      [1]:
        ['bar', '2']
        - key: 'bar'
        - val: '2'
  [1]:
    [['special_key1!', 'wow'], ['another_special', 'abc']]
    - special_parameters: [['special_key1!', 'wow'], ['another_special', 'abc']]
      [0]:
        ['special_key1!', 'wow']
        - key: 'special_key1!'
        - val: 'wow'
      [1]:
        ['another_special', 'abc']
        - key: 'another_special'
        - val: 'abc'
  [2]:
    [['normalAgain', '3']]
    - parameters: [['normalAgain', '3']]
      [0]:
        ['normalAgain', '3']
        - key: 'normalAgain'
        - val: '3'

How can I modify the grammar of my parser in such a way that parse_results.contents[2].parameters[0] will instead become parse_results.contents[0].parameters[3]?

回答1:

Purely a judgment call on where to draw the line on this, and I have written parsers in both styles.

In this particular case, my intuition tells me that it will make for clearer code if you focus your parser and parse actions on grouping, converting, and naming the parts of the individual log entries, and then use a separate method to reorganize them based on your various grouping strategies. My reasoning is that the log message structure is already somewhat complex, and so your parser will have enough work to do to pull out each message into a unified form. Also, your grouping strategies may evolve a bit (need to gather items that are within some small time window, not just exact timestamp matches), and doing this in a separate post-processing method would localize these changes.

From a testing perspective, this would also allow you to test the restructuring code separately from the parsing code, perhaps with a list of dicts or namedtuples that would simulate the parsed results from the separate log records.

tl;dr - For this situation, I'd go with the post-processing method for the final sorting/reorganizing of your parsed log records.

EDIT: To modify the parse results in place, define a parse action that takes a single argument, which I typically name tokens, and modify in place using typical list or dict mutators:

def rearrange(tokens):
    # mutate tokens in place
    tokens.contents[0].parameters.append(tokens.contents[2].parameters[0])

log.addParseAction(rearrange)

If you return None (as in this example), then the tokens structure that was passed in is retained as the token structure to be returned. If you return a non-None value, then the new return value replaces the given tokens in the parser output. This is how integer parsers convert the parsed string to actual integers, or date/time parsers convert the parsed strings to Python datetimes.

来源：https://stackoverflow.com/questions/50958427/restructuring-pyparsing-parse-results-of-multithreaded-log-file

标签

python

multithreading

parsing

pyparsing

logfile