Simple way to convert a string to a dictionary

后端 未结 10 2374
我在风中等你
我在风中等你 2020-12-11 17:36

What is the simplest way to convert a string of keyword=values to a dictionary, for example the following string:

name=\"John Smith\", age=34, height=173.2,          


        
相关标签:
10条回答
  • 2020-12-11 18:16

    do it step by step

    d={}
    mystring='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"';
    s = mystring.split(", ")
    for item in s:
        i=item.split("=",1)
        d[i[0]]=i[-1]
    print d
    
    0 讨论(0)
  • 2020-12-11 18:18

    Here's a somewhat more robust version of the regexp solution:

    import re
    
    keyval_re = re.compile(r'''
       \s*                                  # Leading whitespace is ok.
       (?P<key>\w+)\s*=\s*(                 # Search for a key followed by..
           (?P<str>"[^"]*"|\'[^\']*\')|     #   a quoted string; or
           (?P<float>\d+\.\d+)|             #   a float; or
           (?P<int>\d+)                     #   an int.
       )\s*,?\s*                            # Handle comma & trailing whitespace.
       |(?P<garbage>.+)                     # Complain if we get anything else!
       ''', re.VERBOSE)
    
    def handle_keyval(match):
        if match.group('garbage'):
            raise ValueError("Parse error: unable to parse: %r" %
                             match.group('garbage'))
        key = match.group('key')
        if match.group('str') is not None:
            return (key, match.group('str')[1:-1]) # strip quotes
        elif match.group('float') is not None:
            return (key, float(match.group('float')))
        elif match.group('int') is not None:
            return (key, int(match.group('int')))
    

    It automatically converts floats & ints to the right type; handles single and double quotes; handles extraneous whitespace in various locations; and complains if a badly formatted string is supplied

    >>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
    >>> print dict(handle_keyval(m) for m in keyval_re.finditer(s))
    {'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999}
    
    0 讨论(0)
  • 2020-12-11 18:18

    Always comma separated? Use the CSV module to split the line into parts (not checked):

    import csv
    import cStringIO
    
    parts=csv.reader(cStringIO.StringIO(<string to parse>)).next()
    
    0 讨论(0)
  • 2020-12-11 18:19

    Here is a approach with eval, I considered it is as unreliable though, but its works for your example.

    >>> import re
    >>>
    >>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
    >>>
    >>> eval("{"+re.sub('(\w+)=("[^"]+"|[\d.]+)','"\\1":\\2',s)+"}")
    {'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999}
    >>>
    

    Update:

    Better use the one pointed by Chris Lutz in the comment, I believe Its more reliable, because even there is (single/double) quotes in dict values, it might works.

    0 讨论(0)
  • 2020-12-11 18:20

    Edit: since the csv module doesn't deal as desired with quotes inside fields, it takes a bit more work to implement this functionality:

    import re
    quoted = re.compile(r'"[^"]*"')
    
    class QuoteSaver(object):
    
      def __init__(self):
        self.saver = dict()
        self.reverser = dict()
    
      def preserve(self, mo):
        s = mo.group()
        if s not in self.saver:
          self.saver[s] = '"%d"' % len(self.saver)
          self.reverser[self.saver[s]] = s
        return self.saver[s]
    
      def expand(self, mo):
        return self.reverser[mo.group()]
    
    x = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
    
    qs = QuoteSaver()
    y = quoted.sub(qs.preserve, x)
    kvs_strings = y.split(',')
    kvs_pairs = [kv.split('=') for kv in kvs_strings]
    kvs_restored = [(k, quoted.sub(qs.expand, v)) for k, v in kvs_pairs]
    
    def converter(v):
      if v.startswith('"'): return v.strip('"')
      try: return int(v)
      except ValueError: return float(v)
    
    thedict = dict((k.strip(), converter(v)) for k, v in kvs_restored)
    for k in thedict:
      print "%-8s %s" % (k, thedict[k])
    print thedict
    

    I'm emitting thedict twice to show exactly how and why it differs from the required result; the output is:

    age      34
    location US
    name     John Smith
    avatar   :,=)
    height   173.2
    {'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)',
     'height': 173.19999999999999}
    

    As you see, the output for the floating point value is as requested when directly emitted with print, but it isn't and cannot be (since there IS no floating point value that would display 173.2 in such a case!-) when the print is applied to the whole dict (because that inevitably uses repr on the keys and values -- and the repr of 173.2 has that form, given the usual issues about how floating point values are stored in binary, not in decimal, etc, etc). You might define a dict subclass which overrides __str__ to specialcase floating-point values, I guess, if that's indeed a requirement.

    But, I hope this distraction doesn't interfere with the core idea -- as long as the doublequotes are properly balanced (and there are no doublequotes-inside-doublequotes), this code does perform the required task of preserving "special characters" (commas and equal signs, in this case) from being taken in their normal sense when they're inside double quotes, even if the double quotes start inside a "field" rather than at the beginning of the field (csv only deals with the latter condition). Insert a few intermediate prints if the way the code works is not obvious -- first it changes all "double quoted fields" into a specially simple form ("0", "1" and so on), while separately recording what the actual contents corresponding to those simple forms are; at the end, the simple forms are changed back into the original contents. Double-quote stripping (for strings) and transformation of the unquoted strings into integers or floats is finally handled by the simple converter function.

    0 讨论(0)
  • 2020-12-11 18:24

    I think you just need to set maxsplit=1, for instance the following should work.

    string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
    newDict = dict(map( lambda(z): z.split("=",1), string.split(", ") ))
    

    Edit (see comment):

    I didn't notice that ", " was a value under avatar, the best approach would be to escape ", " wherever you are generating data. Even better would be something like JSON ;). However, as an alternative to regexp, you could try using shlex, which I think produces cleaner looking code.

    import shlex
    
    string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
    lex = shlex.shlex ( string ) 
    lex.whitespace += "," # Default whitespace doesn't include commas
    lex.wordchars += "."  # Word char should include . to catch decimal 
    words = [ x for x in iter( lex.get_token, '' ) ]
    newDict = dict ( zip( words[0::3], words[2::3]) )
    
    0 讨论(0)
提交回复
热议问题