问题
For the output, need to replace the brackets contain a digits with periods '.'. Also remove the brackets at the beginning and end of the domain.
Can we use re.sub for this and if so how?
code
import re
log = ["4/19/2020 11:59:09 PM 2604 PACKET 0000014DE1921330 UDP Rcv 192.168.1.28 f975 Q [0001 D NOERROR] A (7)pagead2(17)googlesyndication(3)com(0)",
"4/19/2020 11:59:09 PM 0574 PACKET 0000014DE18C4720 UDP R cv 192.168.2.54 9c63 Q [0001 D NOERROR] A (2)pg(3)cdn(5)viber(3)com(0)"]
rx_dict = { 'query': re.compile(r'(?P<query>[\S]*)$') }
for item in log:
for key, r_exp in rx_dict.items():
print(f"{r_exp.search(item).group(1)}")
output
(7)pagead2(17)googlesyndication(3)com(0)
(2)pg(3)cdn(5)viber(3)com(0)
preferred output
pagead2.googlesyndication.com
pg.cdn.viber.com
回答1:
Pragmatic python usage:
log = ["4/19/2020 11:59:09 PM 2604 PACKET 0000014DE1921330 UDP Rcv 192.168.1.28 f975 Q [0001 D NOERROR] A (7)pagead2(17)googlesyndication(3)com(0)",
"4/19/2020 11:59:09 PM 0574 PACKET 0000014DE18C4720 UDP R cv 192.168.2.54 9c63 Q [0001 D NOERROR] A (2)pg(3)cdn(5)viber(3)com(0)"]
import re
urls = [re.sub(r'\(\d+\)','.',t.split()[-1]).strip('.') for t in log]
print (urls)
Output:
['pagead2.googlesyndication.com', 'pg.cdn.viber.com']
Dictionary refinement via rules:
If you want to apply consecutive rules via a dictionary, go lambda all the way:
import re
rules = {"r0": lambda x: x.split()[-1],
"r1": lambda x: re.sub(r'\(\d+\)','.',x),
"r2": lambda x: x.strip(".")}
result = []
for value in log:
result.append(value)
for r in rules:
result[-1] = rules[r](result[-1])
print(result)
Output:
['pagead2.googlesyndication.com', 'pg.cdn.viber.com']
回答2:
Yes, you can use re.sub. I assume you have this dictionary so you can extract multiple pieces from the log. You can do something like this for dispatch:
ops = {
"query": lambda e: (
re.sub(r"\(\d+\(", ".", (
re.search(r"(?P<query>[\S]*)$", e).group(1),
)
),
...
}
And then apply the functions to all log entires
log_results = {op_name: op(l) for op_name, op in ops.items() for l in log}
来源:https://stackoverflow.com/questions/61336295/python-regex-substitution-using-a-dictionary-to-clean-up-domain-names