I have some preprocessing to do with some existing .yml files - however, some of them have Jinja template syntax embedded in them:
A:
B:
- ip: 1.2.3.4
- m
In their current format, your .yml files are jinja templates which will not be valid yaml until they have been rendered. This is because the jinja placeholder syntax conflicts with yaml syntax, as braces ({ and }) can be used to represent mappings in yaml.
>>> yaml.load('foo: {{ bar }}')
Traceback (most recent call last):
...
yaml.constructor.ConstructorError: while constructing a mapping
in "", line 1, column 6:
foo: {{ bar }}
^
found unacceptable key (unhashable type: 'dict')
in "", line 1, column 7:
foo: {{ bar }}
One way to workaround this is to replace the jinja placeholders with something else, process the file as yaml, then reinstate the placeholders.
$ cat test.yml
A:
B:
- ip: 1.2.3.4
- myArray:
- {{ jinja_variable }}
- val1
- val2
Open the file as a text file
>>> with open('test.yml') as f:
... text = f.read()
...
>>> print text
A:
B:
- ip: 1.2.3.4
- myArray:
- {{ jinja_variable }}
- val1
- val2
The regular expression r'{{\s*(?P will match any jinja placeholders in the text; the named group jinja in the expression captures the variable name. The regular expression the same as that used by Jinja2 to match variable names.
The re.sub function can reference named groups in its replacement string using the \g syntax. We can use this feature to replace the jinja syntax with something that does not conflict with yaml syntax, and does not already appear in the files that you are processing. For example replace {{ ... }} with << ... >>.
>>> import re
>>> yml_text = re.sub(r'{{\s*(?P[a-zA-Z_][a-zA-Z0-9_]*)\s*}}', '<<\g>>', text)
>>> print yml_text
A:
B:
- ip: 1.2.3.4
- myArray:
- <>
- val1
- val2
Now load the text as yaml:
>>> yml = yaml.load(yml_text)
>>> yml
{'A': {'B': [{'ip': '1.2.3.4'}, {'myArray': ['<>', 'val1', 'val2']}]}}
Add the new value:
>>> yml['A']['B'][1]['myArray'].append('val3')
>>> yml
{'A': {'B': [{'ip': '1.2.3.4'}, {'myArray': ['<>', 'val1', 'val2', 'val3']}]}}
Serialise back to a yaml string:
>>> new_text = yaml.dump(yml, default_flow_style=False)
>>> print new_text
A:
B:
- ip: 1.2.3.4
- myArray:
- <>
- val1
- val2
- val3
Now reinstate the jinja syntax.
>>> new_yml = re.sub(r'<<(?P[a-zA-Z_][a-zA-Z0-9_]*)>>', '{{ \g }}', new_text)
>>> print new_yml
A:
B:
- ip: 1.2.3.4
- myArray:
- {{ jinja_variable }}
- val1
- val2
- val3
And write the yaml to disk.
>>> with open('test.yml', 'w') as f:
... f.write(new_yml)
...
$cat test.yml
A:
B:
- ip: 1.2.3.4
- myArray:
- {{ jinja_variable }}
- val1
- val2
- val3