What is the simplest way to convert a string of keyword=values to a dictionary, for example the following string:
name=\"John Smith\", age=34, height=173.2,
Here is a more verbose approach to the problem using pyparsing. Note the parse actions which do the automatic conversion of types from strings to ints or floats. Also, the QuotedString class implicitly strips the quotation marks from the quoted value. Finally, the Dict class takes each 'key = val' group in the comma-delimited list, and assigns results names using the key and value tokens.
from pyparsing import *
key = Word(alphas)
EQ = Suppress('=')
real = Regex(r'[+-]?\d+\.\d+').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
qs = QuotedString('"')
value = real | integer | qs
dictstring = Dict(delimitedList(Group(key + EQ + value)))
Now to parse your original text string, storing the results in dd. Pyparsing returns an object of type ParseResults, but this class has many dict-like features (support for keys(), items(), in, etc.), or can emit a true Python dict by calling asDict(). Calling dump() shows all of the tokens in the original parsed list, plus all of the named items. The last two examples show how to access named items within a ParseResults as if they were attributes of a Python object.
text = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
dd = dictstring.parseString(text)
print dd.keys()
print dd.items()
print dd.dump()
print dd.asDict()
print dd.name
print dd.avatar
Prints:
['age', 'location', 'name', 'avatar', 'height']
[('age', 34), ('location', 'US'), ('name', 'John Smith'), ('avatar', ':,=)'), ('height', 173.19999999999999)]
[['name', 'John Smith'], ['age', 34], ['height', 173.19999999999999], ['location', 'US'], ['avatar', ':,=)']]
- age: 34
- avatar: :,=)
- height: 173.2
- location: US
- name: John Smith
{'age': 34, 'height': 173.19999999999999, 'location': 'US', 'avatar': ':,=)', 'name': 'John Smith'}
John Smith
:,=)
The following code produces the correct behavior, but is just a bit long! I've added a space in the avatar to show that it deals well with commas and spaces and equal signs inside the string. Any suggestions to shorten it?
import hashlib
string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
strings = {}
def simplify(value):
try:
return int(value)
except:
return float(value)
while True:
try:
p1 = string.index('"')
p2 = string.index('"',p1+1)
substring = string[p1+1:p2]
key = hashlib.md5(substring).hexdigest()
strings[key] = substring
string = string[:p1] + key + string[p2+1:]
except:
break
d = {}
for pair in string.split(', '):
key, value = pair.split('=')
if value in strings:
d[key] = strings[value]
else:
d[key] = simplify(value)
print d
This works for me:
# get all the items
matches = re.findall(r'\w+=".+?"', s) + re.findall(r'\w+=[\d.]+',s)
# partition each match at '='
matches = [m.group().split('=', 1) for m in matches]
# use results to make a dict
d = dict(matches)
I would suggest a lazy way of doing this.
test_string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
eval("dict({})".format(test_string))
{'age': 34, 'location': 'US', 'avatar': ':,=)', 'name': 'John Smith', 'height': 173.2}
Hope this helps someone !