Deserialize a YAML “Table” of data

时光怂恿深爱的人放手 提交于 2019-12-10 05:31:40

问题


I am using yamldotnet and c# to deserialize a file created by a third party software application. The following YAML file examples are both valid from the application:

#File1
Groups:
  - Name: ATeam
    FirstName, LastName, Age, Height:
      - [Joe, Soap, 21, 184]
      - [Mary, Ryan, 20, 169]
      - [Alex, Dole, 24, 174]

#File2
Groups:
  - Name: ATeam
    FirstName, LastName, Height:
      - [Joe, Soap, 184]
      - [Mary, Ryan, 169]
      - [Alex, Dole, 174]

Notice that File2 doesnt have any Age column but the deserializer must still recognise that the third value on each line is a height rather than an age. This data is supposed to represent a table of people. In the case of File1 for example, Mary Ryan is age 20 and is 169cm tall. The deserializer needs to understand the columns it has (for File2 it only has FirstName, LastName and Height) and store the data accordingly in the right objects : Mary Ryan is 169cm tall.

Similarly the program documentation states that the order of the columns is not important so File3 below is an equally valid way to represent the data in File2 even though Height is now first:

#File3
Groups:
 - Name: ATeam
   Height, FirstName, LastName:
      - [184, Joe, Soap]
      - [169, Mary, Ryan]
      - [174, Alex, Dole]

I have a number of questions:

  1. Is this standard YAML? - I could not find anything about the use of a number of keys on the same line followed by a colon and lists of values to represent tables of data.
  2. How would I use yamldotnet to deserialize this? Are there modifications I can make to help it?
  3. If I can't use yamldotnet, how should I go about it?

回答1:


As other answers stated, this is valid YAML. However, the structure of the document is specific to the application, and does not use any special feature of YAML to express tables.

You can easily parse this document using YamlDotNet. However you will run into two difficulties. The first is that, since the names of the columns are placed inside the key, you will need to use some custom serialization code to handle them. The second is that you will need to implement some kind of abstraction to be able to access the data in a tabular way.

I have put-up a proof of concept that will illustrate how to parse and read the data.

First, create a type to hold the information from the YAML document:

public class Document
{
    public List<Group> Groups { get; set; }
}

public class Group
{
    public string Name { get; set; }

    public IEnumerable<string> ColumnNames { get; set; }

    public IList<IList<object>> Rows { get; set; }
}

Then implement IYamlTypeConverter to parse the Group type:

public class GroupYamlConverter : IYamlTypeConverter
{
    private readonly Deserializer deserializer;

    public GroupYamlConverter(Deserializer deserializer)
    {
        this.deserializer = deserializer;
    }

    public bool Accepts(Type type)
    {
        return type == typeof(Group);
    }

    public object ReadYaml(IParser parser, Type type)
    {
        var group = new Group();

        var reader = new EventReader(parser);
        do
        {
            var key = reader.Expect<Scalar>();
            if(key.Value == "Name")
            {
                group.Name = reader.Expect<Scalar>().Value;
            }
            else
            {
                group.ColumnNames = key.Value
                    .Split(',')
                    .Select(n => n.Trim())
                    .ToArray();

                group.Rows = deserializer.Deserialize<IList<IList<object>>>(reader);
            }
        } while(!reader.Accept<MappingEnd>());
        reader.Expect<MappingEnd>();

        return group;
    }

    public void WriteYaml(IEmitter emitter, object value, Type type)
    {
        throw new NotImplementedException("TODO");
    }
}

Last, register the converter into the deserializer and deserialize the document:

var deserializer = new Deserializer();
deserializer.RegisterTypeConverter(new GroupYamlConverter(deserializer));

var document = deserializer.Deserialize<Document>(new StringReader(yaml));

You can test the fully working example here

This is only a proof of concept, but it should serve as a guideline for you own implementation. Things that could be improved include:

  • Checking for and handling invalid documents.
  • Improving the Group class. Maybe make it immutable, and also add an indexer.
  • Implementing the WriteYaml method if serialization support is desired.



回答2:


All of these are valid YAML files. You are however mistaking interpreting a scalar key with commas as constituting a description in YAML of the "columns" in the sequences of the value associated with that key.

In File 1, FirstName, LastName, Age, Height is a single string scalar key for the mapping that is the first element of the sequence that is value for the key Group at the top level. Just like name is. You can, but don't have to in YAML, put quotes around the whole scalar.

The association you make between a string "Firstname" and "Joe" is not there in YAML, you can make that association in the program that interprets the key (by splitting it on ", ") as you seem to be doing, but YAML has no knowledge of that.

So if you want to be smart about this, then you need to split the string "FirstName, LastName, Age, Height" yourself and use some mechanism to then use the "subkeys" to index the sequences that are associated with the key.

If it helps to understand all this, the following is a json dump of the first files' contents, there you see clearly what the keys consist of:

{"Groups": [{"FirstName, LastName, Age, Height": [["Joe", "Soap", 21,
   184], ["Mary", "Ryan", 20, 169], ["Alex", "Dole", 24, 174]], 
   "Name": "ATeam"}]}

I used the Python based ruamel.yaml library for this (of which I am the author) but you could also use an online convertor/checker like http://yaml-online-parser.appspot.com/



来源:https://stackoverflow.com/questions/30894438/deserialize-a-yaml-table-of-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!