I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict
and the list of pairs it currently uses.
What's the best way to do that?
I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict
and the list of pairs it currently uses.
What's the best way to do that?
Update: For python 3.6+ you probably don't need anything special due to the new dict implementation (although considered CPython implementation detail for now).
I like @James' solution for its simplicity. However, it changes the default global yaml.Loader
class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work with yaml.safe_load()
.
Fortunately, the solution can be improved without much effort:
import yaml from collections import OrderedDict def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict): class OrderedLoader(Loader): pass def construct_mapping(loader, node): loader.flatten_mapping(node) return object_pairs_hook(loader.construct_pairs(node)) OrderedLoader.add_constructor( yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, construct_mapping) return yaml.load(stream, OrderedLoader) # usage example: ordered_load(stream, yaml.SafeLoader)
For serialization, I don't know an obvious generalization, but at least this shouldn't have any side effects:
def ordered_dump(data, stream=None, Dumper=yaml.Dumper, **kwds): class OrderedDumper(Dumper): pass def _dict_representer(dumper, data): return dumper.represent_mapping( yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, data.items()) OrderedDumper.add_representer(OrderedDict, _dict_representer) return yaml.dump(data, stream, OrderedDumper, **kwds) # usage: ordered_dump(data, Dumper=yaml.SafeDumper)
The yaml module allow you to specify custom 'representers' to convert Python objects to text and 'constructors' to reverse the process.
_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG def dict_representer(dumper, data): return dumper.represent_dict(data.iteritems()) def dict_constructor(loader, node): return collections.OrderedDict(loader.construct_pairs(node)) yaml.add_representer(collections.OrderedDict, dict_representer) yaml.add_constructor(_mapping_tag, dict_constructor)
I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.
import yaml import yaml.constructor try: # included in standard lib from Python 2.7 from collections import OrderedDict except ImportError: # try importing the backported drop-in replacement # it's available on PyPI from ordereddict import OrderedDict class OrderedDictYAMLLoader(yaml.Loader): """ A YAML loader that loads mappings into ordered dictionaries. """ def __init__(self, *args, **kwargs): yaml.Loader.__init__(self, *args, **kwargs) self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map) self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map) def construct_yaml_map(self, node): data = OrderedDict() yield data value = self.construct_mapping(node) data.update(value) def construct_mapping(self, node, deep=False): if isinstance(node, yaml.MappingNode): self.flatten_mapping(node) else: raise yaml.constructor.ConstructorError(None, None, 'expected a mapping node, but found %s' % node.id, node.start_mark) mapping = OrderedDict() for key_node, value_node in node.value: key = self.construct_object(key_node, deep=deep) try: hash(key) except TypeError, exc: raise yaml.constructor.ConstructorError('while constructing a mapping', node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark) value = self.construct_object(value_node, deep=deep) mapping[key] = value return mapping
import sys import ruamel.yaml as yaml yaml_str = """\ 3: abc conf: 10: def 3: gij # h is missing more: - what - else """ data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader) data['conf'][10] = 'klm' data['conf'][3] = 'jig' yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
will give you:
3: abc conf: 10: klm 3: jig # h is missing more: - what - else
data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)
This was done using ruamel.yaml of which I am the author. It is a fork and superset of PyYAML.
I've just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:
import yaml import yamlordereddictloader datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)
On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.
__init__.py $ diff __init__.py Original 64c64 def load(stream, Loader=Loader): 69c69 loader = Loader(stream) 75c75 def load_all(stream, Loader=Loader): 80c80 loader = Loader(stream) constructor.py $ diff constructor.py Original 20,21c20 def __init__(self): 27,29d25 self.constructed_objects = {} > self.recursive_objects = {} 129c125 mapping = {} 400c396 data = {} 595c591 dictitems = {} 602c598 dictitems = value.get('dictitems', {}) loader.py $ diff loader.py Original 13c13 def __init__(self, stream): 18c18 BaseConstructor.__init__(self) 23c23 def __init__(self, stream): 28c28 SafeConstructor.__init__(self) 33c33 def __init__(self, stream): 38c38 Constructor.__init__(self)
oyaml
is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml
, and import as shown below:
import oyaml as yaml
You'll no longer be annoyed by screwed-up mappings when dumping/loading.
Note: I'm the author of oyaml.
There is a PyYAML ticket on the subject opened 5 years ago. It contains some relevant links, including the link to this very question :) I personally grabbed gist 317164 and modified it a little bit to use OrderedDict from Python 2.7, not the included implementation (just replaced the class with from collections import OrderedDict
).
here's a simple solution that also checks for duplicated top level keys in your map.
import yaml import re from collections import OrderedDict def yaml_load_od(fname): "load a yaml file as an OrderedDict" # detects any duped keys (fail on this) and preserves order of top level keys with open(fname, 'r') as f: lines = open(fname, "r").read().splitlines() top_keys = [] duped_keys = [] for line in lines: m = re.search(r'^([A-Za-z0-9_]+) *:', line) if m: if m.group(1) in top_keys: duped_keys.append(m.group(1)) else: top_keys.append(m.group(1)) if duped_keys: raise Exception('ERROR: duplicate keys: {}'.format(duped_keys)) # 2nd pass to set up the OrderedDict with open(fname, 'r') as f: d_tmp = yaml.load(f) return OrderedDict([(key, d_tmp[key]) for key in top_keys])