How to represent JSON object under CSV

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-24 07:43:09

问题


I'd like to export a JSON object to a CSV file, with sub-fields which have sub-field which are potentially populated by arrays of objects, but I don't know how to represent the embedded data in CSV.


回答1:


This comes down to mapping semi-structured (tree-like) data to tabular data. This is not trivial at all because of the impedance mismatch.

There are several approaches commonly used (and taught) in practice, and with extensive academic research, mostly established for XML, but that can in principle be applied to JSON as well. Approaches more or less come down to:

  • Ad-hoc (schema-based) mapping
  • Edge shredding
  • Tree encoding

First, if your data follows regular patterns (like a schema), you can design an ad-hoc mapping that can, for example, map each leaf (value) to a column in CSV. You can preserve information on the structure using dots, assuming dots are not already used in fields.

For example:

{
  "foo" : {
    "bar" : 10
  },
  "foobar" : "foo"
}

can be mapped to:

| foo.bar | foobar |
|---------|--------|
|  10     |  foo   |

The trickier part is when there are arrays in the game. If you have a big array of similar objects, you can make them all rows in the output CSV:

{
  "objects" : [
    {
      "foo" : {
        "bar" : 10
      },
      "foobar" : "foo"
    },
    {
      "foo" : {
        "bar" : 40
      },
      "foobar" : "bar"
    },
    {
      "foo" : {
        "bar" : 50
      },
      "foobar" : "bar"
    }
  ]
}

could map to:

| objects.pos | objects.foo.bar | objects.foobar |
|-------------|-----------------|----------------|
|       1     |      10         |     foo        |
|       2     |      40         |     bar        |
|       3     |      50         |     bar        |

This is the approach that would be the easiest because the output CSV is still easy to understand, but it requires designing it again for each use case to tune it to your data, in particular for different arrangements in arrays.

From a theoretical perspective, this first, ad-hoc approach is called normalizing the data, i.e., bring it to first normal form or higher.

There are other approaches that are more generic such as edge shredding and tree encoding. They may be overdoing it for your use case because decoding them requires quite some work, so they are more meant for implementing complex XML queries on top of relational databases.

In short, with edge shredding, you create one table (CSV file) for each type (in JSON that would be number, string, boolean, etc) where you store the leaves, and have one table where you store the edges of the original JSON tree.

With tree encoding, you only use one single table (CSV file) that smartly stores all nodes and leaves of the tree. Again, it is tuned for XML but can probably be adapted.

JSON is a bit younger than XML, so I am not sure how much research was already done on mapping to tables -- it is possible that there are also general mappings that specifically address JSON rather than XML, even though the general principles are similar.




回答2:


csv is not as expressive as json. But there are many ways to fake a json structure in csv. For instance:

https://konklone.io/json/?id=a624ffaa84db538b4a10465c72bf393d

More on that: http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-json-to-csv-using-python/



来源:https://stackoverflow.com/questions/41939648/how-to-represent-json-object-under-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!