Simple way to convert a string to a dictionary

后端未结

关注

 10  2374

What is the simplest way to convert a string of keyword=values to a dictionary, for example the following string:

name=\"John Smith\", age=34, height=173.2,


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2020-12-11 18:16
              
            
            
                                                                       
do it step by step

d={}
mystring='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"';
s = mystring.split(", ")
for item in s:
    i=item.split("=",1)
    d[i[0]]=i[-1]
print d

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-11 18:18
              
            
            
                                                                       
Here's a somewhat more robust version of the regexp solution:

import re

keyval_re = re.compile(r'''
   \s*                                  # Leading whitespace is ok.
   (?P<key>\w+)\s*=\s*(                 # Search for a key followed by..
       (?P<str>"[^"]*"|\'[^\']*\')|     #   a quoted string; or
       (?P<float>\d+\.\d+)|             #   a float; or
       (?P<int>\d+)                     #   an int.
   )\s*,?\s*                            # Handle comma & trailing whitespace.
   |(?P<garbage>.+)                     # Complain if we get anything else!
   ''', re.VERBOSE)

def handle_keyval(match):
    if match.group('garbage'):
        raise ValueError("Parse error: unable to parse: %r" %
                         match.group('garbage'))
    key = match.group('key')
    if match.group('str') is not None:
        return (key, match.group('str')[1:-1]) # strip quotes
    elif match.group('float') is not None:
        return (key, float(match.group('float')))
    elif match.group('int') is not None:
        return (key, int(match.group('int')))


It automatically converts floats & ints to the right type; handles single and double quotes; handles extraneous whitespace in various locations; and complains if a badly formatted string is supplied

>>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
>>> print dict(handle_keyval(m) for m in keyval_re.finditer(s))
{'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-12-11 18:18
              
            
            
                                                                       
Always comma separated?  Use the CSV module to split the line into parts (not checked):

import csv
import cStringIO

parts=csv.reader(cStringIO.StringIO(<string to parse>)).next()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2020-12-11 18:19
              
            
            
                                                                       
Here is a approach with eval, I considered it is as unreliable though, but its works for your example.

>>> import re
>>>
>>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
>>>
>>> eval("{"+re.sub('(\w+)=("[^"]+"|[\d.]+)','"\\1":\\2',s)+"}")
{'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999}
>>>


Update:

Better use the one pointed by Chris Lutz in the comment, I believe Its more reliable, because even there is (single/double) quotes in dict values, it might works.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-12-11 18:20
              
            
            
                                                                       
Edit: since the csv module doesn't deal as desired with quotes inside fields, it takes a bit more work to implement this functionality:

import re
quoted = re.compile(r'"[^"]*"')

class QuoteSaver(object):

  def __init__(self):
    self.saver = dict()
    self.reverser = dict()

  def preserve(self, mo):
    s = mo.group()
    if s not in self.saver:
      self.saver[s] = '"%d"' % len(self.saver)
      self.reverser[self.saver[s]] = s
    return self.saver[s]

  def expand(self, mo):
    return self.reverser[mo.group()]

x = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'

qs = QuoteSaver()
y = quoted.sub(qs.preserve, x)
kvs_strings = y.split(',')
kvs_pairs = [kv.split('=') for kv in kvs_strings]
kvs_restored = [(k, quoted.sub(qs.expand, v)) for k, v in kvs_pairs]

def converter(v):
  if v.startswith('"'): return v.strip('"')
  try: return int(v)
  except ValueError: return float(v)

thedict = dict((k.strip(), converter(v)) for k, v in kvs_restored)
for k in thedict:
  print "%-8s %s" % (k, thedict[k])
print thedict


I'm emitting thedict twice to show exactly how and why it differs from the required result; the output is:

age      34
location US
name     John Smith
avatar   :,=)
height   173.2
{'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)',
 'height': 173.19999999999999}


As you see, the output for the floating point value is as requested when directly emitted with print, but it isn't and cannot be (since there IS no floating point value that would display 173.2 in such a case!-) when the print is applied to the whole dict (because that inevitably uses repr on the keys and values -- and the repr of 173.2 has that form, given the usual issues about how floating point values are stored in binary, not in decimal, etc, etc).  You might define a dict subclass which overrides __str__ to specialcase floating-point values, I guess, if that's indeed a requirement.

But, I hope this distraction doesn't interfere with the core idea -- as long as the doublequotes are properly balanced (and there are no doublequotes-inside-doublequotes), this code does perform the required task of preserving "special characters" (commas and equal signs, in this case) from being taken in their normal sense when they're inside double quotes, even if the double quotes start inside a "field" rather than at the beginning of the field (csv only deals with the latter condition).  Insert a few intermediate prints if the way the code works is not obvious -- first it changes all "double quoted fields" into a specially simple form ("0", "1" and so on), while separately recording what the actual contents corresponding to those simple forms are; at the end, the simple forms are changed back into the original contents.  Double-quote stripping (for strings) and transformation of the unquoted strings into integers or floats is finally handled by the simple converter function.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2020-12-11 18:24
              
            
            
                                                                       
I think you just need to set maxsplit=1, for instance the following should work.

string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
newDict = dict(map( lambda(z): z.split("=",1), string.split(", ") ))


Edit (see comment): 

I didn't notice that ", " was a value under avatar, the best approach would be to escape ", " wherever you are generating data. Even better would be something like JSON ;). However, as an alternative to regexp, you could try using shlex, which I think produces cleaner looking code. 

import shlex

string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
lex = shlex.shlex ( string ) 
lex.whitespace += "," # Default whitespace doesn't include commas
lex.wordchars += "."  # Word char should include . to catch decimal 
words = [ x for x in iter( lex.get_token, '' ) ]
newDict = dict ( zip( words[0::3], words[2::3]) )

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复