I would like to parse the parameter and keyword values from URI/L\'s in a text file. Parameters without values should also be included. Python is fine but am open to suggestion
I would use a regular expression like this (first code then explanation):
pairs = re.findall(r'(\w+)=(.*?)(?:\n|&)', s, re.S)
for k, v in pairs:
print('{0} = {1}'.format(k, v))
The first line is where the action happens. The regular expression finds all occurrences of a word followed by an equal sign and then a string that terminates either by a & or by a new line char. The return pairs is a tuple list, where each tuple contains the word (the keyword) and the value. I didn't capture the = sign, and instead I print it in the loop.
Explaining the regex:
\w+ means one or more word chars. The parenthesis around it means to capture it and return that value as a result.
= - the equal sign that must follow the word
.*? - zero or more chars in a non-greedy manner, that is until a new line appears or the & sign, which is designated by \n|&. The (?:.. pattern means that the \n or & should not be captured.
Since we capture 2 things in the regex - the keyword and everything after the = sign, a list of 2-tuples is returned.
The re.S tells the regex engine to allow the match-all regex code - . - include in the search the new line char as well, that is, allow the search span over multiple lines (which is not default behavior).