问题
Esteemed colleagues, I have a raw data format as i detailed below where primarily it has to be three line and every line is starts with pattern dn:
following ftpuser
and description
, whereas there are situations where the third line description
is missing hence First two lines are intacted in this case. Now, I'm using a multiline regex to match all these patterns and using it to get the data from my data
variable and this is passed to the regex(re.findall), Further, i have for looped the matchObj
to get the values in a index form so i have can only the desired indexes from new_str
List.
Below is the Data File :
dn: uid=ac002,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description: file transfer|12/31/2010|file transfer
dn: uid=ab02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: disabled_5Mar07
description: Remedy Tkt 01239399 regg move
dn: uid=mela,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description: ROYALS|none|customer account
dn: uid=aa01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
dn: uid=aa02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=aa03,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=bb01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
dn: uid=bb02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=bb03,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=bb05,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=ab01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description:: VGVzdGluZyA=
dn: uid=tt@regg.com,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
description: REG-JP|7-31-05|REG-JP
Below is the code which i tried, but the problem here is as , this code only picks the data where it gets all three lines (dn:
,ftpuser
,description
) and line where it has only two lines ((dn:
,ftpuser
) it fails to retrieve those hence i would like to know , how we can get those line also into the similar output making/appending Description: null
wherever its missing
#!/usr/bin/python3
# ./dataparse.py
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import re
with open('test2', 'r') as f:
for line in f:
line = line.strip()
data = f.read()
regex = (r"dn:(.*?)\nftpuser: (.*)\ndescription:* (.*)")
matchObj = re.findall(regex, data)
for index in matchObj:
#print(index)
index_str = ' '.join(index)
new_str = re.sub(r'[=,]', ' ', index_str)
new_str = new_str.split()
print("{0:<30}{1:<20}{2:<50}".format(new_str[1],new_str[8],new_str[9]))
Resulted output:
$ ./dataparse.py
ab02 disabled_5Mar07 Remedy
mela Y ROYALS|none|customer
ab01 Y VGVzdGluZyA
tt@regg.com T REG-JP|7-31-05|REG-JP
As a python beginner i would appreciate any help or suggestion.
回答1:
Simply make description optional in your regex pattern. Change it to:
r"dn:(.*?)\nftpuser: (.*)\n(?:description:* (.*))?"
来源:https://stackoverflow.com/questions/51106466/search-patterns-from-the-text-file-and-if-pattern-missing-place-a-value-null