python regular expression across multiple lines

后端 未结 2 870
鱼传尺愫
鱼传尺愫 2020-12-03 23:53

I\'m gathering some info from some cisco devices using python and pexpect, and had a lot of success with REs to extract pesky little items. I\'m afraid i\'ve hit the wall on

相关标签:
2条回答
  • 2020-12-04 00:34
    x="""Top Assembly Part Number        : 800-25858-06
    Top Assembly Revision Number    : A0
    Version ID                      : V08
    CLEI Code Number                : COMDE10BRA
    Hardware Board Revision Number  : 0x01
    
    
    Switch   Ports  Model              SW Version              SW Image
    ------   -----  -----              ----------              ----------
    *    1   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
         2   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
         3   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
         4   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
    
    
    Switch 02
    ---------
    Switch Uptime                   : 11 weeks, 2 days, 16 hours, 27 minutes
    Base ethernet MAC Address       : 00:26:52:96:2A:80
    Motherboard assembly number     : 73-9675-15"""
    
    >>> import re
    >>> re.findall("^\*?\s*(\d)\s*\d+\s*([A-Z\d-]+)",x,re.MULTILINE)
    [('1', 'WS-C3750-48P'), ('2', 'WS-C3750-48P'), ('3', 'WS-C3750-48P'), ('4', 'WS-C3750-48P')]
    

    UPDATE: because OP edited question, and Thanks Tom for pointing out for +

    >>> re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)",x,re.MULTILINE)
    [('*', '1', 'WS-C3750-48P'), ('', '2', 'WS-C3750-48P'), ('', '3', 'WS-C3750-48P'), ('', '4', 'WS-C3750-48P')]
    >>>
    
    0 讨论(0)
  • 2020-12-04 00:46

    To have . match any character, including a newline, compile your RE with re.DOTALL among the options (remember, if you have multiple options, use |, the bit-or operator, between them, in order to combine them).

    In this case I'm not sure you actually do need this -- why not something like

    re.findall(r'(\d+)\s+\d+\s+(WS-\S+)')
    

    assuming for example that the way you identify a "model" is that it starts with WS-? The fact that there will be newlines between one result of findall and the next one is not a problem here. Can you explain exactly how you identify a "model" and why "multiline" is an issue? Maybe you want the re.MULTILINE to make ^ match at each start-of-line, to grab your data with some reference to the start of the lines...?

    0 讨论(0)
提交回复
热议问题