partial string formatting

Is it possible to do partial string formatting with the advanced string formatting methods, similar to the string template safe_substitute() function?

For example:

s = '{foo} {bar}'
s.format(foo='FOO') #Problem: raises KeyError 'bar'

You can trick it into partial formatting by overwriting the mapping:

import string

class FormatDict(dict):
    def __missing__(self, key):
        return "{" + key + "}"

s = '{foo} {bar}'
formatter = string.Formatter()
mapping = FormatDict(foo='FOO')
print(formatter.vformat(s, (), mapping))

printing

FOO {bar}

Of course this basic implementation only works correctly for basic cases.

If you know in what order you're formatting things:

s = '{foo} {{bar}}'

Use it like this:

ss = s.format(foo='FOO') 
print ss 
>>> 'FOO {bar}'

print ss.format(bar='BAR')
>>> 'FOO BAR'

You can't specify foo and bar at the same time - you have to do it sequentially.

Saikiran Yerram

You could use the partial function from functools which is short, most readable and also describes best the coder's intention:

from functools import partial

s = partial("{foo} {bar}".format, foo="FOO")
print s(bar="BAR")
# FOO BAR

This limitation of .format() - the inability to do partial substitutions - has been bugging me.

After evaluating writing a custom Formatter class as described in many answers here and even considering using third-party packages such as lazy_format, I discovered a much simpler inbuilt solution: Template strings

It provides similar functionality but also provides partial substitution thorough safe_substitute() method. The template strings need to have a $ prefix (which feels a bit weird - but the overall solution I think is better).

import string
template = string.Template('${x} ${y}')
try:
  template.substitute({'x':1}) # raises KeyError
except KeyError:
  pass

# but the following raises no error
partial_str = template.safe_substitute({'x':1}) # no error

# partial_str now contains a string with partial substitution
partial_template = string.Template(partial_str)
substituted_str = partial_template.safe_substitute({'y':2}) # no error
print substituted_str # prints '12'

Formed a convenience wrapper based on this:

class StringTemplate(object):
    def __init__(self, template):
        self.template = string.Template(template)
        self.partial_substituted_str = None

    def __repr__(self):
        return self.template.safe_substitute()

    def format(self, *args, **kws):
        self.partial_substituted_str = self.template.safe_substitute(*args, **kws)
        self.template = string.Template(self.partial_substituted_str)
        return self.__repr__()


>>> s = StringTemplate('${x}${y}')
>>> s
'${x}${y}'
>>> s.format(x=1)
'1${y}'
>>> s.format({'y':2})
'12'
>>> print s
12

Similarly a wrapper based on Sven's answer which uses the default string formatting:

class StringTemplate(object):
    class FormatDict(dict):
        def __missing__(self, key):
            return "{" + key + "}"

    def __init__(self, template):
        self.substituted_str = template
        self.formatter = string.Formatter()

    def __repr__(self):
        return self.substituted_str

    def format(self, *args, **kwargs):
        mapping = StringTemplate.FormatDict(*args, **kwargs)
        self.substituted_str = self.formatter.vformat(self.substituted_str, (), mapping)

Not sure if this is ok as a quick workaround, but how about

s = '{foo} {bar}'
s.format(foo='FOO', bar='{bar}')

? :)

If you define your own Formatter which overrides the get_value method, you could use that to map undefined field names to whatever you wanted:

http://docs.python.org/library/string.html#string.Formatter.get_value

For instance, you could map bar to "{bar}" if bar isn't in the kwargs.

However, that requires using the format() method of your Formatter object, not the string's format() method.

Pengfei.X

>>> 'fd:{uid}:{{topic_id}}'.format(uid=123)
'fd:123:{topic_id}'

Try this out.

gatto

Thanks to Amber's comment, I came up with this:

import string

try:
    # Python 3
    from _string import formatter_field_name_split
except ImportError:
    formatter_field_name_split = str._formatter_field_name_split


class PartialFormatter(string.Formatter):
    def get_field(self, field_name, args, kwargs):
        try:
            val = super(PartialFormatter, self).get_field(field_name, args, kwargs)
        except (IndexError, KeyError, AttributeError):
            first, _ = formatter_field_name_split(field_name)
            val = '{' + field_name + '}', first
        return val

For me this was good enough:

>>> ss = 'dfassf {} dfasfae efaef {} fds'
>>> nn = ss.format('f1', '{}')
>>> nn
'dfassf f1 dfasfae efaef {} fds'
>>> n2 = nn.format('whoa')
>>> n2
'dfassf f1 dfasfae efaef whoa fds'

My suggestion would be the following (tested with Python3.6):

class Lazymap(object):
       def __init__(self, **kwargs):
           self.dict = kwargs

       def __getitem__(self, key):
           return self.dict.get(key, "".join(["{", key, "}"]))


s = '{foo} {bar}'

s.format_map(Lazymap(bar="FOO"))
# >>> '{foo} FOO'

s.format_map(Lazymap(bar="BAR"))
# >>> '{foo} BAR'

s.format_map(Lazymap(bar="BAR", foo="FOO", baz="BAZ"))
# >>> 'FOO BAR'

Update: An even more elegant way (subclassing dict and overloading __missing__(self, key)) is shown here: https://stackoverflow.com/a/17215533/333403

Assuming you won't use the string until it's completely filled out, you could do something like this class:

class IncrementalFormatting:
    def __init__(self, string):
        self._args = []
        self._kwargs = {}
        self._string = string

    def add(self, *args, **kwargs):
        self._args.extend(args)
        self._kwargs.update(kwargs)

    def get(self):
        return self._string.format(*self._args, **self._kwargs)

Example:

template = '#{a}:{}/{}?{c}'
message = IncrementalFormatting(template)
message.add('abc')
message.add('xyz', a=24)
message.add(c='lmno')
assert message.get() == '#24:abc/xyz?lmno'

There is one more way to achieve this i.e by using format and % to replace variables. For example:

>>> s = '{foo} %(bar)s'
>>> s = s.format(foo='my_foo')
>>> s
'my_foo %(bar)s'
>>> s % {'bar': 'my_bar'}
'my_foo my_bar'

A very ugly but the simplest solution for me is to just do:

tmpl = '{foo}, {bar}'
tmpl.replace('{bar}', 'BAR')
Out[3]: '{foo}, BAR'

This way you still can use tmpl as regular template and perform partial formatting only when needed. I find this problem too trivial to use a overkilling solution like Mohan Raj's.

After testing the most promising solutions from here and there, I realized that none of them really met the following requirements:

strictly adhere to the syntax recognized by str.format_map() for the template;
being able to retain complex formatting, i.e. fully supporting the Format Mini-Language

So, I wrote my own solution, which satisfies the above requirements. (EDIT: now the version by @SvenMarnach -- as reported in this answer -- seems to handle the corner cases I needed).

Basically, I ended up parsing the template string, finding matching nested {.*?} groups (using a find_all() helper function) and building the formatted string progressively and directly using str.format_map() while catching any potential KeyError.

def find_all(
        text,
        pattern,
        overlap=False):
    """
    Find all occurrencies of the pattern in the text.

    Args:
        text (str|bytes|bytearray): The input text.
        pattern (str|bytes|bytearray): The pattern to find.
        overlap (bool): Detect overlapping patterns.

    Yields:
        position (int): The position of the next finding.
    """
    len_text = len(text)
    offset = 1 if overlap else (len(pattern) or 1)
    i = 0
    while i < len_text:
        i = text.find(pattern, i)
        if i >= 0:
            yield i
            i += offset
        else:
            break

def matching_delimiters(
        text,
        l_delim,
        r_delim,
        including=True):
    """
    Find matching delimiters in a sequence.

    The delimiters are matched according to nesting level.

    Args:
        text (str|bytes|bytearray): The input text.
        l_delim (str|bytes|bytearray): The left delimiter.
        r_delim (str|bytes|bytearray): The right delimiter.
        including (bool): Include delimeters.

    yields:
        result (tuple[int]): The matching delimiters.
    """
    l_offset = len(l_delim) if including else 0
    r_offset = len(r_delim) if including else 0
    stack = []

    l_tokens = set(find_all(text, l_delim))
    r_tokens = set(find_all(text, r_delim))
    positions = l_tokens.union(r_tokens)
    for pos in sorted(positions):
        if pos in l_tokens:
            stack.append(pos + 1)
        elif pos in r_tokens:
            if len(stack) > 0:
                prev = stack.pop()
                yield (prev - l_offset, pos + r_offset, len(stack))
            else:
                raise ValueError(
                    'Found `{}` unmatched right token(s) `{}` (position: {}).'
                        .format(len(r_tokens) - len(l_tokens), r_delim, pos))
    if len(stack) > 0:
        raise ValueError(
            'Found `{}` unmatched left token(s) `{}` (position: {}).'
                .format(
                len(l_tokens) - len(r_tokens), l_delim, stack.pop() - 1))

def safe_format_map(
        text,
        source):
    """
    Perform safe string formatting from a mapping source.

    If a value is missing from source, this is simply ignored, and no
    `KeyError` is raised.

    Args:
        text (str): Text to format.
        source (Mapping|None): The mapping to use as source.
            If None, uses caller's `vars()`.

    Returns:
        result (str): The formatted text.
    """
    stack = []
    for i, j, depth in matching_delimiters(text, '{', '}'):
        if depth == 0:
            try:
                replacing = text[i:j].format_map(source)
            except KeyError:
                pass
            else:
                stack.append((i, j, replacing))
    result = ''
    i, j = len(text), 0
    while len(stack) > 0:
        last_i = i
        i, j, replacing = stack.pop()
        result = replacing + text[j:last_i] + result
    if i > 0:
        result = text[0:i] + result
    return result

(This code is also available in FlyingCircus -- DISCLAIMER: I am the main author of it.)

The usage for this code would be:

print(safe_format_map('{a} {b} {c}', dict(a=-A-)))
# -A- {b} {c}

Let's compare this to the my favourite solution (by @SvenMarnach who kindly shared his code here and there):

import string


class FormatPlaceholder:
    def __init__(self, key):
        self.key = key
    def __format__(self, spec):
        result = self.key
        if spec:
            result += ":" + spec
        return "{" + result + "}"
    def __getitem__(self, index):
        self.key = "{}[{}]".format(self.key, index)
        return self
    def __getattr__(self, attr):
        self.key = "{}.{}".format(self.key, attr)
        return self


class FormatDict(dict):
    def __missing__(self, key):
        return FormatPlaceholder(key)


def safe_format_alt(text, source):
    formatter = string.Formatter()
    return formatter.vformat(text, (), FormatDict(source))

Here are a couple of tests:

test_texts = (
    '{b} {f}',  # simple nothing useful in source
    '{a} {b}',  # simple
    '{a} {b} {c:5d}',  # formatting
    '{a} {b} {c!s}',  # coercion
    '{a} {b} {c!s:>{a}s}',  # formatting and coercion
    '{a} {b} {c:0{a}d}',  # nesting
    '{a} {b} {d[x]}',  # dicts (existing in source)
    '{a} {b} {e.index}',  # class (existing in source)
    '{a} {b} {f[g]}',  # dict (not existing in source)
    '{a} {b} {f.values}',  # class (not existing in source)

)
source = dict(a=4, c=101, d=dict(x='FOO'), e=[])

and the code to make it running:

funcs = safe_format_map, safe_format_alt

n = 18
for text in test_texts:
    full_source = {**dict(b='---', f=dict(g='Oh yes!')), **source}
    print('{:>{n}s} :   OK   : '.format('str.format_map', n=n) + text.format_map(full_source))
    for func in funcs:
        try:
            print(f'{func.__name__:>{n}s} :   OK   : ' + func(text, source))
        except:
            print(f'{func.__name__:>{n}s} : FAILED : {text}')

resulting in:

    str.format_map :   OK   : --- {'g': 'Oh yes!'}
   safe_format_map :   OK   : {b} {f}
   safe_format_alt :   OK   : {b} {f}
    str.format_map :   OK   : 4 ---
   safe_format_map :   OK   : 4 {b}
   safe_format_alt :   OK   : 4 {b}
    str.format_map :   OK   : 4 ---   101
   safe_format_map :   OK   : 4 {b}   101
   safe_format_alt :   OK   : 4 {b}   101
    str.format_map :   OK   : 4 --- 101
   safe_format_map :   OK   : 4 {b} 101
   safe_format_alt :   OK   : 4 {b} 101
    str.format_map :   OK   : 4 ---  101
   safe_format_map :   OK   : 4 {b}  101
   safe_format_alt :   OK   : 4 {b}  101
    str.format_map :   OK   : 4 --- 0101
   safe_format_map :   OK   : 4 {b} 0101
   safe_format_alt :   OK   : 4 {b} 0101
    str.format_map :   OK   : 4 --- FOO
   safe_format_map :   OK   : 4 {b} FOO
   safe_format_alt :   OK   : 4 {b} FOO
    str.format_map :   OK   : 4 --- <built-in method index of list object at 0x7f7a485666c8>
   safe_format_map :   OK   : 4 {b} <built-in method index of list object at 0x7f7a485666c8>
   safe_format_alt :   OK   : 4 {b} <built-in method index of list object at 0x7f7a485666c8>
    str.format_map :   OK   : 4 --- Oh yes!
   safe_format_map :   OK   : 4 {b} {f[g]}
   safe_format_alt :   OK   : 4 {b} {f[g]}
    str.format_map :   OK   : 4 --- <built-in method values of dict object at 0x7f7a485da090>
   safe_format_map :   OK   : 4 {b} {f.values}
   safe_format_alt :   OK   : 4 {b} {f.values}

as you can see, the updated version now seems to handle well the corner cases where the earlier version used to fail.

Timewise, they are within approx. 50% of each other, depending on the actual text to format (and likely the actual source), but safe_format_map() seems to have an edge in most of the tests I performed (whatever they mean, of course):

for text in test_texts:
    print(f'  {text}')
    %timeit safe_format(text * 1000, source)
    %timeit safe_format_alt(text * 1000, source)

  {b} {f}
3.93 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.35 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b}
4.37 ms ± 57.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.2 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {c:5d}
7.15 ms ± 91.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.76 ms ± 69.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {c!s}
7.04 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.56 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {c!s:>{a}s}
8.91 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.5 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {c:0{a}d}
8.84 ms ± 147 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {d[x]}
7.01 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.35 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {e.index}
11 ms ± 68.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.78 ms ± 405 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {f[g]}
6.55 ms ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.12 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  {a} {b} {f.values}
6.61 ms ± 55.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.92 ms ± 98.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You could wrap it in a function that takes default arguments:

def print_foo_bar(foo='', bar=''):
    s = '{foo} {bar}'
    return s.format(foo=foo, bar=bar)

print_foo_bar(bar='BAR') # ' BAR'

来源：https://stackoverflow.com/questions/11283961/partial-string-formatting

标签

python

string-formatting