In Python, how can you load YAML mappings as OrderedDicts?

匿名 (未验证) 提交于 2019-12-03 01:58:03

问题:

I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict and the list of pairs it currently uses.

What's the best way to do that?

回答1:

Update: For python 3.6+ you probably don't need anything special due to the new dict implementation (although considered CPython implementation detail for now).

I like @James' solution for its simplicity. However, it changes the default global yaml.Loader class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work with yaml.safe_load().

Fortunately, the solution can be improved without much effort:

import yaml from collections import OrderedDict  def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):     class OrderedLoader(Loader):         pass     def construct_mapping(loader, node):         loader.flatten_mapping(node)         return object_pairs_hook(loader.construct_pairs(node))     OrderedLoader.add_constructor(         yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,         construct_mapping)     return yaml.load(stream, OrderedLoader)  # usage example: ordered_load(stream, yaml.SafeLoader)

For serialization, I don't know an obvious generalization, but at least this shouldn't have any side effects:

def ordered_dump(data, stream=None, Dumper=yaml.Dumper, **kwds):     class OrderedDumper(Dumper):         pass     def _dict_representer(dumper, data):         return dumper.represent_mapping(             yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,             data.items())     OrderedDumper.add_representer(OrderedDict, _dict_representer)     return yaml.dump(data, stream, OrderedDumper, **kwds)  # usage: ordered_dump(data, Dumper=yaml.SafeDumper)


回答2:

The yaml module allow you to specify custom 'representers' to convert Python objects to text and 'constructors' to reverse the process.

_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG  def dict_representer(dumper, data):     return dumper.represent_dict(data.iteritems())  def dict_constructor(loader, node):     return collections.OrderedDict(loader.construct_pairs(node))  yaml.add_representer(collections.OrderedDict, dict_representer) yaml.add_constructor(_mapping_tag, dict_constructor)


回答3:

I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.

import yaml import yaml.constructor  try:     # included in standard lib from Python 2.7     from collections import OrderedDict except ImportError:     # try importing the backported drop-in replacement     # it's available on PyPI     from ordereddict import OrderedDict  class OrderedDictYAMLLoader(yaml.Loader):     """     A YAML loader that loads mappings into ordered dictionaries.     """      def __init__(self, *args, **kwargs):         yaml.Loader.__init__(self, *args, **kwargs)          self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)         self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)      def construct_yaml_map(self, node):         data = OrderedDict()         yield data         value = self.construct_mapping(node)         data.update(value)      def construct_mapping(self, node, deep=False):         if isinstance(node, yaml.MappingNode):             self.flatten_mapping(node)         else:             raise yaml.constructor.ConstructorError(None, None,                 'expected a mapping node, but found %s' % node.id, node.start_mark)          mapping = OrderedDict()         for key_node, value_node in node.value:             key = self.construct_object(key_node, deep=deep)             try:                 hash(key)             except TypeError, exc:                 raise yaml.constructor.ConstructorError('while constructing a mapping',                     node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)             value = self.construct_object(value_node, deep=deep)             mapping[key] = value         return mapping


回答4:

import sys import ruamel.yaml as yaml  yaml_str = """\ 3: abc conf:     10: def     3: gij     # h is missing more: - what - else """  data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader) data['conf'][10] = 'klm' data['conf'][3] = 'jig' yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)

will give you:

3: abc conf:   10: klm   3: jig       # h is missing more: - what - else

data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)

This was done using ruamel.yaml of which I am the author. It is a fork and superset of PyYAML.



回答5:

I've just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:

import yaml import yamlordereddictloader  datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)


回答6:

On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.

__init__.py  $ diff __init__.py Original 64c64  def load(stream, Loader=Loader): 69c69      loader = Loader(stream) 75c75  def load_all(stream, Loader=Loader): 80c80      loader = Loader(stream)  constructor.py  $ diff constructor.py Original 20,21c20      def __init__(self): 27,29d25          self.constructed_objects = {} >         self.recursive_objects = {} 129c125          mapping = {} 400c396          data = {} 595c591              dictitems = {} 602c598              dictitems = value.get('dictitems', {})  loader.py  $ diff loader.py Original 13c13      def __init__(self, stream): 18c18          BaseConstructor.__init__(self) 23c23      def __init__(self, stream): 28c28          SafeConstructor.__init__(self) 33c33      def __init__(self, stream): 38c38          Constructor.__init__(self)


回答7:

2018 option:

oyaml is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml, and import as shown below:

import oyaml as yaml

You'll no longer be annoyed by screwed-up mappings when dumping/loading.

Note: I'm the author of oyaml.



回答8:

There is a PyYAML ticket on the subject opened 5 years ago. It contains some relevant links, including the link to this very question :) I personally grabbed gist 317164 and modified it a little bit to use OrderedDict from Python 2.7, not the included implementation (just replaced the class with from collections import OrderedDict).



回答9:

here's a simple solution that also checks for duplicated top level keys in your map.

import yaml import re from collections import OrderedDict  def yaml_load_od(fname):     "load a yaml file as an OrderedDict"     # detects any duped keys (fail on this) and preserves order of top level keys     with open(fname, 'r') as f:         lines = open(fname, "r").read().splitlines()         top_keys = []         duped_keys = []         for line in lines:             m = re.search(r'^([A-Za-z0-9_]+) *:', line)             if m:                 if m.group(1) in top_keys:                     duped_keys.append(m.group(1))                 else:                     top_keys.append(m.group(1))         if duped_keys:             raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))     # 2nd pass to set up the OrderedDict     with open(fname, 'r') as f:         d_tmp = yaml.load(f)     return OrderedDict([(key, d_tmp[key]) for key in top_keys])


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!