Is it possible to do partial string formatting with the advanced string formatting methods, similar to the string template safe_substitute()
function?
F
If you'd like to unpack a dictionary to pass arguments to format
, as in this related question, you could use the following method.
First assume the string s
is the same as in this question:
s = '{foo} {bar}'
and the values are given by the following dictionary:
replacements = {'foo': 'FOO'}
Clearly this won't work:
s.format(**replacements)
#---------------------------------------------------------------------------
#KeyError Traceback (most recent call last)
#<ipython-input-29-ef5e51de79bf> in <module>()
#----> 1 s.format(**replacements)
#
#KeyError: 'bar'
However, you could first get a set of all of the named arguments from s and create a dictionary that maps the argument to itself wrapped in curly braces:
from string import Formatter
args = {x[1]:'{'+x[1]+'}' for x in Formatter().parse(s)}
print(args)
#{'foo': '{foo}', 'bar': '{bar}'}
Now use the args
dictionary to fill in the missing keys in replacements
. For python 3.5+, you can do this in a single expression:
new_s = s.format(**{**args, **replacements}}
print(new_s)
#'FOO {bar}'
For older versions of python, you could call update
:
args.update(replacements)
print(s.format(**args))
#'FOO {bar}'
After testing the most promising solutions from here and there, I realized that none of them really met the following requirements:
str.format_map()
for the template;So, I wrote my own solution, which satisfies the above requirements. (EDIT: now the version by @SvenMarnach -- as reported in this answer -- seems to handle the corner cases I needed).
Basically, I ended up parsing the template string, finding matching nested {.*?}
groups (using a find_all()
helper function) and building the formatted string progressively and directly using str.format_map()
while catching any potential KeyError
.
def find_all(
text,
pattern,
overlap=False):
"""
Find all occurrencies of the pattern in the text.
Args:
text (str|bytes|bytearray): The input text.
pattern (str|bytes|bytearray): The pattern to find.
overlap (bool): Detect overlapping patterns.
Yields:
position (int): The position of the next finding.
"""
len_text = len(text)
offset = 1 if overlap else (len(pattern) or 1)
i = 0
while i < len_text:
i = text.find(pattern, i)
if i >= 0:
yield i
i += offset
else:
break
def matching_delimiters(
text,
l_delim,
r_delim,
including=True):
"""
Find matching delimiters in a sequence.
The delimiters are matched according to nesting level.
Args:
text (str|bytes|bytearray): The input text.
l_delim (str|bytes|bytearray): The left delimiter.
r_delim (str|bytes|bytearray): The right delimiter.
including (bool): Include delimeters.
yields:
result (tuple[int]): The matching delimiters.
"""
l_offset = len(l_delim) if including else 0
r_offset = len(r_delim) if including else 0
stack = []
l_tokens = set(find_all(text, l_delim))
r_tokens = set(find_all(text, r_delim))
positions = l_tokens.union(r_tokens)
for pos in sorted(positions):
if pos in l_tokens:
stack.append(pos + 1)
elif pos in r_tokens:
if len(stack) > 0:
prev = stack.pop()
yield (prev - l_offset, pos + r_offset, len(stack))
else:
raise ValueError(
'Found `{}` unmatched right token(s) `{}` (position: {}).'
.format(len(r_tokens) - len(l_tokens), r_delim, pos))
if len(stack) > 0:
raise ValueError(
'Found `{}` unmatched left token(s) `{}` (position: {}).'
.format(
len(l_tokens) - len(r_tokens), l_delim, stack.pop() - 1))
def safe_format_map(
text,
source):
"""
Perform safe string formatting from a mapping source.
If a value is missing from source, this is simply ignored, and no
`KeyError` is raised.
Args:
text (str): Text to format.
source (Mapping|None): The mapping to use as source.
If None, uses caller's `vars()`.
Returns:
result (str): The formatted text.
"""
stack = []
for i, j, depth in matching_delimiters(text, '{', '}'):
if depth == 0:
try:
replacing = text[i:j].format_map(source)
except KeyError:
pass
else:
stack.append((i, j, replacing))
result = ''
i, j = len(text), 0
while len(stack) > 0:
last_i = i
i, j, replacing = stack.pop()
result = replacing + text[j:last_i] + result
if i > 0:
result = text[0:i] + result
return result
(This code is also available in FlyingCircus -- DISCLAIMER: I am the main author of it.)
The usage for this code would be:
print(safe_format_map('{a} {b} {c}', dict(a=-A-)))
# -A- {b} {c}
Let's compare this to the my favourite solution (by @SvenMarnach who kindly shared his code here and there):
import string
class FormatPlaceholder:
def __init__(self, key):
self.key = key
def __format__(self, spec):
result = self.key
if spec:
result += ":" + spec
return "{" + result + "}"
def __getitem__(self, index):
self.key = "{}[{}]".format(self.key, index)
return self
def __getattr__(self, attr):
self.key = "{}.{}".format(self.key, attr)
return self
class FormatDict(dict):
def __missing__(self, key):
return FormatPlaceholder(key)
def safe_format_alt(text, source):
formatter = string.Formatter()
return formatter.vformat(text, (), FormatDict(source))
Here are a couple of tests:
test_texts = (
'{b} {f}', # simple nothing useful in source
'{a} {b}', # simple
'{a} {b} {c:5d}', # formatting
'{a} {b} {c!s}', # coercion
'{a} {b} {c!s:>{a}s}', # formatting and coercion
'{a} {b} {c:0{a}d}', # nesting
'{a} {b} {d[x]}', # dicts (existing in source)
'{a} {b} {e.index}', # class (existing in source)
'{a} {b} {f[g]}', # dict (not existing in source)
'{a} {b} {f.values}', # class (not existing in source)
)
source = dict(a=4, c=101, d=dict(x='FOO'), e=[])
and the code to make it running:
funcs = safe_format_map, safe_format_alt
n = 18
for text in test_texts:
full_source = {**dict(b='---', f=dict(g='Oh yes!')), **source}
print('{:>{n}s} : OK : '.format('str.format_map', n=n) + text.format_map(full_source))
for func in funcs:
try:
print(f'{func.__name__:>{n}s} : OK : ' + func(text, source))
except:
print(f'{func.__name__:>{n}s} : FAILED : {text}')
resulting in:
str.format_map : OK : --- {'g': 'Oh yes!'}
safe_format_map : OK : {b} {f}
safe_format_alt : OK : {b} {f}
str.format_map : OK : 4 ---
safe_format_map : OK : 4 {b}
safe_format_alt : OK : 4 {b}
str.format_map : OK : 4 --- 101
safe_format_map : OK : 4 {b} 101
safe_format_alt : OK : 4 {b} 101
str.format_map : OK : 4 --- 101
safe_format_map : OK : 4 {b} 101
safe_format_alt : OK : 4 {b} 101
str.format_map : OK : 4 --- 101
safe_format_map : OK : 4 {b} 101
safe_format_alt : OK : 4 {b} 101
str.format_map : OK : 4 --- 0101
safe_format_map : OK : 4 {b} 0101
safe_format_alt : OK : 4 {b} 0101
str.format_map : OK : 4 --- FOO
safe_format_map : OK : 4 {b} FOO
safe_format_alt : OK : 4 {b} FOO
str.format_map : OK : 4 --- <built-in method index of list object at 0x7f7a485666c8>
safe_format_map : OK : 4 {b} <built-in method index of list object at 0x7f7a485666c8>
safe_format_alt : OK : 4 {b} <built-in method index of list object at 0x7f7a485666c8>
str.format_map : OK : 4 --- Oh yes!
safe_format_map : OK : 4 {b} {f[g]}
safe_format_alt : OK : 4 {b} {f[g]}
str.format_map : OK : 4 --- <built-in method values of dict object at 0x7f7a485da090>
safe_format_map : OK : 4 {b} {f.values}
safe_format_alt : OK : 4 {b} {f.values}
as you can see, the updated version now seems to handle well the corner cases where the earlier version used to fail.
Timewise, they are within approx. 50% of each other, depending on the actual text
to format (and likely the actual source
), but safe_format_map()
seems to have an edge in most of the tests I performed (whatever they mean, of course):
for text in test_texts:
print(f' {text}')
%timeit safe_format(text * 1000, source)
%timeit safe_format_alt(text * 1000, source)
{b} {f}
3.93 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.35 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b}
4.37 ms ± 57.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.2 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {c:5d}
7.15 ms ± 91.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.76 ms ± 69.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {c!s}
7.04 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.56 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {c!s:>{a}s}
8.91 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.5 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {c:0{a}d}
8.84 ms ± 147 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {d[x]}
7.01 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.35 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {e.index}
11 ms ± 68.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.78 ms ± 405 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {f[g]}
6.55 ms ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.12 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{a} {b} {f.values}
6.61 ms ± 55.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.92 ms ± 98.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you know in what order you're formatting things:
s = '{foo} {{bar}}'
Use it like this:
ss = s.format(foo='FOO')
print ss
>>> 'FOO {bar}'
print ss.format(bar='BAR')
>>> 'FOO BAR'
You can't specify foo
and bar
at the same time - you have to do it sequentially.
I like @sven-marnach answer. My answer is simply an extended version of it. It allows non-keyword formatting and ignores extra keys. Here are examples of usage (the name of a function is a reference to python 3.6 f-string formatting):
# partial string substitution by keyword
>>> f('{foo} {bar}', foo="FOO")
'FOO {bar}'
# partial string substitution by argument
>>> f('{} {bar}', 1)
'1 {bar}'
>>> f('{foo} {}', 1)
'{foo} 1'
# partial string substitution with arguments and keyword mixed
>>> f('{foo} {} {bar} {}', '|', bar='BAR')
'{foo} | BAR {}'
# partial string substitution with extra keyword
>>> f('{foo} {bar}', foo="FOO", bro="BRO")
'FOO {bar}'
# you can simply 'pour out' your dictionary to format function
>>> kwargs = {'foo': 'FOO', 'bro': 'BRO'}
>>> f('{foo} {bar}', **kwargs)
'FOO {bar}'
And here is my code:
from string import Formatter
class FormatTuple(tuple):
def __getitem__(self, key):
if key + 1 > len(self):
return "{}"
return tuple.__getitem__(self, key)
class FormatDict(dict):
def __missing__(self, key):
return "{" + key + "}"
def f(string, *args, **kwargs):
"""
String safe substitute format method.
If you pass extra keys they will be ignored.
If you pass incomplete substitute map, missing keys will be left unchanged.
:param string:
:param kwargs:
:return:
>>> f('{foo} {bar}', foo="FOO")
'FOO {bar}'
>>> f('{} {bar}', 1)
'1 {bar}'
>>> f('{foo} {}', 1)
'{foo} 1'
>>> f('{foo} {} {bar} {}', '|', bar='BAR')
'{foo} | BAR {}'
>>> f('{foo} {bar}', foo="FOO", bro="BRO")
'FOO {bar}'
"""
formatter = Formatter()
args_mapping = FormatTuple(args)
mapping = FormatDict(kwargs)
return formatter.vformat(string, args_mapping, mapping)
You could use the partial
function from functools
which is short, most readable and also describes the coder's intention:
from functools import partial
s = partial("{foo} {bar}".format, foo="FOO")
print s(bar="BAR")
# FOO BAR
For me this was good enough:
>>> ss = 'dfassf {} dfasfae efaef {} fds'
>>> nn = ss.format('f1', '{}')
>>> nn
'dfassf f1 dfasfae efaef {} fds'
>>> n2 = nn.format('whoa')
>>> n2
'dfassf f1 dfasfae efaef whoa fds'