Regex to match whatsapp chat log

99封情书 提交于 2019-12-22 17:58:01

问题


I've been trying to create Regex for WhatsApp chat log.

So far I've been able to achieve this

Click Here for the test link

By creating the following Regex:

(?P<datetime>\d{2}\/\d{2}\/\d{4},\s\d(?:\d)?:\d{2} [pa].m.)\s-\s(?P<name>[^:]*):(?P<message>.*)

The problem with this regex is, it is not able to match big messages which span multiple lines with line breaks. You can see the issue in the link provided above.

Help would be appreciated.

Thank you.


回答1:


There you go:

^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)

See your modified demo on regex101.com.


Essentially, I added anchors, simplified your datetime part and inserted a [\s\S]+? which means: match anything lazily (including newlines) up to the following condition which is a lookahead. The lookahead makes sure there's either another two digits right after a newline (could be tightened!) or the very end of the string.


回答2:


The dot does not match newline characters, which is why you only get the first line matched. The matching behaviour of a regular expression engine can usually be modified with flags.

On the regexp101 page, you can click on Set Regex Options (the flag right next to the regular expression input field) and activate Single line, then the dot will also match \n.

But then you have to modify your expression so that it detects the start of the next message, otherwise everything will be interpreted as one message.



来源:https://stackoverflow.com/questions/50280191/regex-to-match-whatsapp-chat-log

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!