I\'m gathering some info from some cisco devices using python and pexpect, and had a lot of success with REs to extract pesky little items. I\'m afraid i\'ve hit the wall on
x="""Top Assembly Part Number : 800-25858-06
Top Assembly Revision Number : A0
Version ID : V08
CLEI Code Number : COMDE10BRA
Hardware Board Revision Number : 0x01
Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
2 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
3 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
4 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
Switch 02
---------
Switch Uptime : 11 weeks, 2 days, 16 hours, 27 minutes
Base ethernet MAC Address : 00:26:52:96:2A:80
Motherboard assembly number : 73-9675-15"""
>>> import re
>>> re.findall("^\*?\s*(\d)\s*\d+\s*([A-Z\d-]+)",x,re.MULTILINE)
[('1', 'WS-C3750-48P'), ('2', 'WS-C3750-48P'), ('3', 'WS-C3750-48P'), ('4', 'WS-C3750-48P')]
UPDATE: because OP edited question, and Thanks Tom for pointing out for +
>>> re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)",x,re.MULTILINE)
[('*', '1', 'WS-C3750-48P'), ('', '2', 'WS-C3750-48P'), ('', '3', 'WS-C3750-48P'), ('', '4', 'WS-C3750-48P')]
>>>
To have .
match any character, including a newline, compile your RE with re.DOTALL among the options (remember, if you have multiple options, use |
, the bit-or operator, between them, in order to combine them).
In this case I'm not sure you actually do need this -- why not something like
re.findall(r'(\d+)\s+\d+\s+(WS-\S+)')
assuming for example that the way you identify a "model" is that it starts with WS-
? The fact that there will be newlines between one result of findall
and the next one is not a problem here. Can you explain exactly how you identify a "model" and why "multiline" is an issue? Maybe you want the re.MULTILINE to make ^
match at each start-of-line, to grab your data with some reference to the start of the lines...?