Regex: How to match sequence of key-value pairs at end of string

ⅰ亾dé卋堺 提交于 2019-11-30 05:41:34

The negative zero-width lookahead is (?!pattern).

It's mentioned part-way down the re module documentation page.

(?!...)

Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

So you could use it to match any number of words after a key, but not a key using something like (?!\S+:)\S+.

And the complete code would look like this:

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

Which gives

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

If you print the key/values using:

for match in matches:
    print match

It will print:

key1: val1-words
key2: val2-words
key3: val3-words

Or using your updated example, it would print:

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

You could turn each key/value pair into a dictionary using something like this:

pairs = dict([match.split(':', 1) for match in matches])

which would make it easier to look up only the keys (and values) you want.

More info:


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!