What valid JSON files are not valid YAML 1.1 files?

后端 未结 2 1358
死守一世寂寞
死守一世寂寞 2021-02-15 13:52

YAML 1.2 is (with one minor caveat regarding duplicate keys) a superset of JSON, so any valid JSON file is also a valid YAML file. However, the YAML 1.1 specification (which ha

相关标签:
2条回答
  • 2021-02-15 14:24

    As you noticed, one thing is what the specifications say the other what commonly available parsers (both YAML and JSON) process. You should therefore take several aspects into account and use the least common denominator to not be able to load your JSON with a YAML parser.

    On the JSON side there are multiple standards and best practises. Originally a JSON text would have to have an object or array at the topmost level. This is still so according to the fail1.json files available on the json.org site:

    "A JSON payload should be an object or array, not a string."
    

    According to RFC7159 any value can be at the top level (apart from using a string, this leads to rather boring JSON files):

    A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.

    Because of the problems with JSON hijacking *by redefining the array handing in older browsers) there have been implementations that only accept an object at the top level (i.e. the first character of the file has to be {.

    On the YAML side there are fewer competing standards than with JSON, but things get muddled by the persistent usage of YAML 1.1, and is not helped by the fact that if you google for "yaml current spec" the first hit is yaml.org/spec/current.html and that is actually an old working-draft for YAML 1.1

    Apart from the UTF-32 support the other answer mentioned, which is largely a non-issue in a world using UTF-8 almost exclusively, there are a few things to take into account, especially if you want PyYAML to to be able to parse your JSON (PyYAML still implements most of YAML 1.1 only, close to eight years after the YAML 1.2 spec release):

    • numbers in JSON don't need a dot in the mantissa, even if such a number has an exponent:

      but the Floating-Point Language-Independent Type for YAML™ Version 1.1 does require that dot:

      |[-]?0\.([0-9]*[1-9])?e[-+](0|[1-9][0-9]+) (scientific)
             ^--- no ? or * associated with this dot
      

      (in the YAML 1.2 spec this regex has changed to:

      -? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?.
      

      allowing the dot to disappear even if there is an e (and no E) and exponent.

      This is the cause for your 12345e999 being handled differently by JSON (overflow) and PyYAML (string). In YAML 1.1 this can only be interpreted as a string and hence doesn't need quotes and can be plain scalar.

    • In YAML 1.1 there are escape sequences, but this is not a superset from what JSON supports. The forward slash (/) can be escaped in JSON, but not in YAML 1.1 (it can in YAML 1.2, rule 53)

    • In JSON as well as in YAML 1.1 you can use \uNNNN to indicate a 16 bit unicode code point. Although the YAML 1.1 spec (and YAML 1.2) mentions surrogate pairs in conjunction with using UTF-16, nothing is mentioned about such pairs as escaped sequences ("\uD834\uDD1E"). This string sequence is explicitly mentioned in RFC 7159 as representing the G clef character (U+1D11E). I don't know of any YAML parser that support this, PyYAML throws a:

      yaml.reader.ReaderError: unacceptable character #xd834: special characters are not allowed

    So as long as you write your JSON

    • as UTF-8
    • with the top-level being an object
    • scientific numbers always with a dot
    • no \/ escape sequence
    • no \uNNNN characters between \uD7FF and \uE000 (exclusive), nor \uFFFE, nor \uFFFF

    you should be fine for both JSON and YAML (1.1) parsers.


    ¹ In ruamel.yaml a YAML 1.2 parser of which I am the author, the \/ and scientific numbers without dot are handled correctly: your 12345e999 loads as type float and prints as inf.

    0 讨论(0)
  • 2021-02-15 14:28

    See here (specifically footnote 25). It says:

    The incompatibilities were as follows: JSON allows extended character sets like UTF-32 and had incompatible unicode character escape syntax relative to YAML; YAML required a space after separators like comma, equals, and colon while JSON does not. Some non-standard implementations of JSON extend the grammar to include Javascript's /*...*/ comments. Handling such edge cases may require light pre-processing of the JSON before parsing as in-line YAML

    See also https://metacpan.org/pod/JSON::XS#JSON-and-YAML

    Related
    What is the difference between YAML and JSON? When to prefer one over the other

    0 讨论(0)
提交回复
热议问题