Extract occurrence of text between brackets from a text file Python

时光总嘲笑我的痴心妄想 提交于 2020-05-07 03:50:07

问题


Log file:

INFO:werkzeug:127.0.0.1 - - [20/Sep/2018 19:40:00] "GET /socket.io/?polling HTTP/1.1" 200 -
INFO:engineio: Received packet MESSAGE, ["key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}]

I'm interested in extracting only the text from with in the brackets which contain the keyword "key" and not all of the occurrences that match the regex pattern from below.

Here is what I have tried so far:

import re
with open('logfile.log', 'r') as text_file:
    matches = re.findall(r'\[([^\]]+)', text_file.read())
    with open('output.txt', 'w') as out:
        out.write('\n'.join(matches))

This outputs all of the occurrences that match the regex. The desired output to the output.txt would look like this:

"key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}

回答1:


To match text within square brackets that cannot have [ and ] inside it, but should contain some other text can be matched with a [^][] negated character class.

That is, you may match the whole text within square brackets with \[[^][]*], and if you need to match some text inside, you need to put that text after [^][]* and then append another occurrence of [^][]* before the closing ].

You may use

re.findall(r'\[([^][]*"key"[^][]*)]', text_file.read()) 

See the Python demo:

import re
s = '''INFO:werkzeug:127.0.0.1 - - [20/Sep/2018 19:40:00] "GET /socket.io/?polling HTTP/1.1" 200 - 
INFO:engineio: Received packet MESSAGE, ["key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}]'''
print(re.findall(r'\[([^][]*"key"[^][]*)]', s)) 

Output:

['"key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}']


来源:https://stackoverflow.com/questions/52447842/extract-occurrence-of-text-between-brackets-from-a-text-file-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!