partial string formatting

前端 未结 21 1021
野的像风
野的像风 2020-11-28 04:30

Is it possible to do partial string formatting with the advanced string formatting methods, similar to the string template safe_substitute() function?

F

相关标签:
21条回答
  • 2020-11-28 05:03

    If you'd like to unpack a dictionary to pass arguments to format, as in this related question, you could use the following method.

    First assume the string s is the same as in this question:

    s = '{foo} {bar}'
    

    and the values are given by the following dictionary:

    replacements = {'foo': 'FOO'}
    

    Clearly this won't work:

    s.format(**replacements)
    #---------------------------------------------------------------------------
    #KeyError                                  Traceback (most recent call last)
    #<ipython-input-29-ef5e51de79bf> in <module>()
    #----> 1 s.format(**replacements)
    #
    #KeyError: 'bar'
    

    However, you could first get a set of all of the named arguments from s and create a dictionary that maps the argument to itself wrapped in curly braces:

    from string import Formatter
    args = {x[1]:'{'+x[1]+'}' for x in Formatter().parse(s)}
    print(args)
    #{'foo': '{foo}', 'bar': '{bar}'}
    

    Now use the args dictionary to fill in the missing keys in replacements. For python 3.5+, you can do this in a single expression:

    new_s = s.format(**{**args, **replacements}}
    print(new_s)
    #'FOO {bar}'
    

    For older versions of python, you could call update:

    args.update(replacements)
    print(s.format(**args))
    #'FOO {bar}'
    
    0 讨论(0)
  • 2020-11-28 05:06

    After testing the most promising solutions from here and there, I realized that none of them really met the following requirements:

    1. strictly adhere to the syntax recognized by str.format_map() for the template;
    2. being able to retain complex formatting, i.e. fully supporting the Format Mini-Language

    So, I wrote my own solution, which satisfies the above requirements. (EDIT: now the version by @SvenMarnach -- as reported in this answer -- seems to handle the corner cases I needed).

    Basically, I ended up parsing the template string, finding matching nested {.*?} groups (using a find_all() helper function) and building the formatted string progressively and directly using str.format_map() while catching any potential KeyError.

    def find_all(
            text,
            pattern,
            overlap=False):
        """
        Find all occurrencies of the pattern in the text.
    
        Args:
            text (str|bytes|bytearray): The input text.
            pattern (str|bytes|bytearray): The pattern to find.
            overlap (bool): Detect overlapping patterns.
    
        Yields:
            position (int): The position of the next finding.
        """
        len_text = len(text)
        offset = 1 if overlap else (len(pattern) or 1)
        i = 0
        while i < len_text:
            i = text.find(pattern, i)
            if i >= 0:
                yield i
                i += offset
            else:
                break
    
    def matching_delimiters(
            text,
            l_delim,
            r_delim,
            including=True):
        """
        Find matching delimiters in a sequence.
    
        The delimiters are matched according to nesting level.
    
        Args:
            text (str|bytes|bytearray): The input text.
            l_delim (str|bytes|bytearray): The left delimiter.
            r_delim (str|bytes|bytearray): The right delimiter.
            including (bool): Include delimeters.
    
        yields:
            result (tuple[int]): The matching delimiters.
        """
        l_offset = len(l_delim) if including else 0
        r_offset = len(r_delim) if including else 0
        stack = []
    
        l_tokens = set(find_all(text, l_delim))
        r_tokens = set(find_all(text, r_delim))
        positions = l_tokens.union(r_tokens)
        for pos in sorted(positions):
            if pos in l_tokens:
                stack.append(pos + 1)
            elif pos in r_tokens:
                if len(stack) > 0:
                    prev = stack.pop()
                    yield (prev - l_offset, pos + r_offset, len(stack))
                else:
                    raise ValueError(
                        'Found `{}` unmatched right token(s) `{}` (position: {}).'
                            .format(len(r_tokens) - len(l_tokens), r_delim, pos))
        if len(stack) > 0:
            raise ValueError(
                'Found `{}` unmatched left token(s) `{}` (position: {}).'
                    .format(
                    len(l_tokens) - len(r_tokens), l_delim, stack.pop() - 1))
    
    def safe_format_map(
            text,
            source):
        """
        Perform safe string formatting from a mapping source.
    
        If a value is missing from source, this is simply ignored, and no
        `KeyError` is raised.
    
        Args:
            text (str): Text to format.
            source (Mapping|None): The mapping to use as source.
                If None, uses caller's `vars()`.
    
        Returns:
            result (str): The formatted text.
        """
        stack = []
        for i, j, depth in matching_delimiters(text, '{', '}'):
            if depth == 0:
                try:
                    replacing = text[i:j].format_map(source)
                except KeyError:
                    pass
                else:
                    stack.append((i, j, replacing))
        result = ''
        i, j = len(text), 0
        while len(stack) > 0:
            last_i = i
            i, j, replacing = stack.pop()
            result = replacing + text[j:last_i] + result
        if i > 0:
            result = text[0:i] + result
        return result
    

    (This code is also available in FlyingCircus -- DISCLAIMER: I am the main author of it.)


    The usage for this code would be:

    print(safe_format_map('{a} {b} {c}', dict(a=-A-)))
    # -A- {b} {c}
    

    Let's compare this to the my favourite solution (by @SvenMarnach who kindly shared his code here and there):

    import string
    
    
    class FormatPlaceholder:
        def __init__(self, key):
            self.key = key
        def __format__(self, spec):
            result = self.key
            if spec:
                result += ":" + spec
            return "{" + result + "}"
        def __getitem__(self, index):
            self.key = "{}[{}]".format(self.key, index)
            return self
        def __getattr__(self, attr):
            self.key = "{}.{}".format(self.key, attr)
            return self
    
    
    class FormatDict(dict):
        def __missing__(self, key):
            return FormatPlaceholder(key)
    
    
    def safe_format_alt(text, source):
        formatter = string.Formatter()
        return formatter.vformat(text, (), FormatDict(source))
    

    Here are a couple of tests:

    test_texts = (
        '{b} {f}',  # simple nothing useful in source
        '{a} {b}',  # simple
        '{a} {b} {c:5d}',  # formatting
        '{a} {b} {c!s}',  # coercion
        '{a} {b} {c!s:>{a}s}',  # formatting and coercion
        '{a} {b} {c:0{a}d}',  # nesting
        '{a} {b} {d[x]}',  # dicts (existing in source)
        '{a} {b} {e.index}',  # class (existing in source)
        '{a} {b} {f[g]}',  # dict (not existing in source)
        '{a} {b} {f.values}',  # class (not existing in source)
    
    )
    source = dict(a=4, c=101, d=dict(x='FOO'), e=[])
    

    and the code to make it running:

    funcs = safe_format_map, safe_format_alt
    
    n = 18
    for text in test_texts:
        full_source = {**dict(b='---', f=dict(g='Oh yes!')), **source}
        print('{:>{n}s} :   OK   : '.format('str.format_map', n=n) + text.format_map(full_source))
        for func in funcs:
            try:
                print(f'{func.__name__:>{n}s} :   OK   : ' + func(text, source))
            except:
                print(f'{func.__name__:>{n}s} : FAILED : {text}')
    

    resulting in:

        str.format_map :   OK   : --- {'g': 'Oh yes!'}
       safe_format_map :   OK   : {b} {f}
       safe_format_alt :   OK   : {b} {f}
        str.format_map :   OK   : 4 ---
       safe_format_map :   OK   : 4 {b}
       safe_format_alt :   OK   : 4 {b}
        str.format_map :   OK   : 4 ---   101
       safe_format_map :   OK   : 4 {b}   101
       safe_format_alt :   OK   : 4 {b}   101
        str.format_map :   OK   : 4 --- 101
       safe_format_map :   OK   : 4 {b} 101
       safe_format_alt :   OK   : 4 {b} 101
        str.format_map :   OK   : 4 ---  101
       safe_format_map :   OK   : 4 {b}  101
       safe_format_alt :   OK   : 4 {b}  101
        str.format_map :   OK   : 4 --- 0101
       safe_format_map :   OK   : 4 {b} 0101
       safe_format_alt :   OK   : 4 {b} 0101
        str.format_map :   OK   : 4 --- FOO
       safe_format_map :   OK   : 4 {b} FOO
       safe_format_alt :   OK   : 4 {b} FOO
        str.format_map :   OK   : 4 --- <built-in method index of list object at 0x7f7a485666c8>
       safe_format_map :   OK   : 4 {b} <built-in method index of list object at 0x7f7a485666c8>
       safe_format_alt :   OK   : 4 {b} <built-in method index of list object at 0x7f7a485666c8>
        str.format_map :   OK   : 4 --- Oh yes!
       safe_format_map :   OK   : 4 {b} {f[g]}
       safe_format_alt :   OK   : 4 {b} {f[g]}
        str.format_map :   OK   : 4 --- <built-in method values of dict object at 0x7f7a485da090>
       safe_format_map :   OK   : 4 {b} {f.values}
       safe_format_alt :   OK   : 4 {b} {f.values}
    

    as you can see, the updated version now seems to handle well the corner cases where the earlier version used to fail.


    Timewise, they are within approx. 50% of each other, depending on the actual text to format (and likely the actual source), but safe_format_map() seems to have an edge in most of the tests I performed (whatever they mean, of course):

    for text in test_texts:
        print(f'  {text}')
        %timeit safe_format(text * 1000, source)
        %timeit safe_format_alt(text * 1000, source)
    
      {b} {f}
    3.93 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    6.35 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b}
    4.37 ms ± 57.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    5.2 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {c:5d}
    7.15 ms ± 91.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    7.76 ms ± 69.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {c!s}
    7.04 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    7.56 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {c!s:>{a}s}
    8.91 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    10.5 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {c:0{a}d}
    8.84 ms ± 147 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    10.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {d[x]}
    7.01 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    7.35 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {e.index}
    11 ms ± 68.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    8.78 ms ± 405 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {f[g]}
    6.55 ms ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    9.12 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      {a} {b} {f.values}
    6.61 ms ± 55.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    9.92 ms ± 98.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    0 讨论(0)
  • 2020-11-28 05:08

    If you know in what order you're formatting things:

    s = '{foo} {{bar}}'
    

    Use it like this:

    ss = s.format(foo='FOO') 
    print ss 
    >>> 'FOO {bar}'
    
    print ss.format(bar='BAR')
    >>> 'FOO BAR'
    

    You can't specify foo and bar at the same time - you have to do it sequentially.

    0 讨论(0)
  • 2020-11-28 05:08

    I like @sven-marnach answer. My answer is simply an extended version of it. It allows non-keyword formatting and ignores extra keys. Here are examples of usage (the name of a function is a reference to python 3.6 f-string formatting):

    # partial string substitution by keyword
    >>> f('{foo} {bar}', foo="FOO")
    'FOO {bar}'
    
    # partial string substitution by argument
    >>> f('{} {bar}', 1)
    '1 {bar}'
    
    >>> f('{foo} {}', 1)
    '{foo} 1'
    
    # partial string substitution with arguments and keyword mixed
    >>> f('{foo} {} {bar} {}', '|', bar='BAR')
    '{foo} | BAR {}'
    
    # partial string substitution with extra keyword
    >>> f('{foo} {bar}', foo="FOO", bro="BRO")
    'FOO {bar}'
    
    # you can simply 'pour out' your dictionary to format function
    >>> kwargs = {'foo': 'FOO', 'bro': 'BRO'}
    >>> f('{foo} {bar}', **kwargs)
    'FOO {bar}'
    

    And here is my code:

    from string import Formatter
    
    
    class FormatTuple(tuple):
        def __getitem__(self, key):
            if key + 1 > len(self):
                return "{}"
            return tuple.__getitem__(self, key)
    
    
    class FormatDict(dict):
        def __missing__(self, key):
            return "{" + key + "}"
    
    
    def f(string, *args, **kwargs):
        """
        String safe substitute format method.
        If you pass extra keys they will be ignored.
        If you pass incomplete substitute map, missing keys will be left unchanged.
        :param string:
        :param kwargs:
        :return:
    
        >>> f('{foo} {bar}', foo="FOO")
        'FOO {bar}'
        >>> f('{} {bar}', 1)
        '1 {bar}'
        >>> f('{foo} {}', 1)
        '{foo} 1'
        >>> f('{foo} {} {bar} {}', '|', bar='BAR')
        '{foo} | BAR {}'
        >>> f('{foo} {bar}', foo="FOO", bro="BRO")
        'FOO {bar}'
        """
        formatter = Formatter()
        args_mapping = FormatTuple(args)
        mapping = FormatDict(kwargs)
        return formatter.vformat(string, args_mapping, mapping)
    
    0 讨论(0)
  • 2020-11-28 05:09

    You could use the partial function from functools which is short, most readable and also describes the coder's intention:

    from functools import partial
    
    s = partial("{foo} {bar}".format, foo="FOO")
    print s(bar="BAR")
    # FOO BAR
    
    0 讨论(0)
  • 2020-11-28 05:09

    For me this was good enough:

    >>> ss = 'dfassf {} dfasfae efaef {} fds'
    >>> nn = ss.format('f1', '{}')
    >>> nn
    'dfassf f1 dfasfae efaef {} fds'
    >>> n2 = nn.format('whoa')
    >>> n2
    'dfassf f1 dfasfae efaef whoa fds'
    
    0 讨论(0)
提交回复
热议问题