python: extracting variables from string templates

大城市里の小女人 提交于 2019-12-10 03:47:07

问题


I am familiar with the ability to insert variables into a string using Templates, like this:

Template('value is between $min and $max').substitute(min=5, max=10)

What I now want to know is if it is possible to do the reverse. I want to take a string, and extract the values from it using a template, so that I have some data structure (preferably just named variables, but a dict is fine) that contains the extracted values. For example:

>>> string = 'value is between 5 and 10'
>>> d = Backwards_template('value is between $min and $max').extract(string)
>>> print d
{'min': '5', 'max':'10'}

Is this possible?


回答1:


That's called regular expressions:

import re
string = 'value is between 5 and 10'
m = re.match(r'value is between (.*) and (.*)', string)
print(m.group(1), m.group(2))

Output:

5 10

Update 1. Names can be given to groups:

m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
print(m.group('min'), m.group('max'))

But this feature is not used often, as there are usually enough problems with a more important aspect: how to capture exactly what you want (with this particular case that's not a big deal, but even here: what if the string is value is between 1 and 2 and 3 -- should the string be accepted and what's the min and max?).


Update 2. Rather than making a precise regex, it's sometimes easier to combine regular expressions and "regular" code like this:

m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
try:
    value_min = float(m.group('min'))
    value_max = float(m.group('max'))
except (AttributeError, ValueError):  # no match or failed conversion
    value_min = None
    value_max = None

This combined approach is especially worth remembering when your text consists of many chunks (like phrases in quotes of different types) to be processed: in tricky cases, it's harder to define a single regex to handle both delimiters and contents of chunks than to define several steps like text.split(), optional merging of chunks, and independent processing of each chunk (using regexes and other means).




回答2:


It's not possible to perfectly reverse the substitution. The problem is that some strings are ambiguous, for example

value is between 5 and 7 and 10

would have two possible solutions: min = "5", max = "7 and 10" and min = "5 and 7", max = "10"

However, you might be able to achieve useful results with regex:

import re

string = 'value is between 5 and 10'
template= 'value is between $min and $max'

pattern= re.escape(template)
pattern= re.sub(r'\\\$(\w+)', r'(?P<\1>.*)', pattern)
match= re.match(pattern, string)
print(match.groupdict()) # output: {'max': '10', 'min': '5'}



回答3:


The behave module for Behavior-Driven Development provides a few different mechanisms for specifying and parsing templates.

Depending on the complexity of your templates, and the other needs of your app, you might find one or the other most useful. (Plus, you can steal their pre-written code.)




回答4:


You can use the difflib module to compare the two strings and pull out the information you want.

https://docs.python.org/3.6/library/difflib.html

For example:

import difflib

def backwards_template(my_string, template):
    my_lib = {}
    entry = ''
    value = ''

    for s in difflib.ndiff(my_string, template):
        if s[0]==' ':
            if entry != '' and value != '':
                my_lib[entry] = value 
                entry = ''
                value = ''   
        elif s[0]=='-':
            value += s[2]
        elif s[0]=='+':
            if s[2] != '$':
                entry += s[2]

    # check ending if non-empty
    if entry != '' and value != '':
        my_lib[entry] = value

    return my_lib

my_string = 'value is between 5 and 10'
template = 'value is between $min and $max'     

print(backwards_template(my_string, template))

Gives: {'min': '5', 'max': '10'}



来源:https://stackoverflow.com/questions/42536406/python-extracting-variables-from-string-templates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!