What are good alternative data formats to XML?

淺唱寂寞╮ 提交于 2019-12-21 03:19:17

问题


XML, granted, is very useful, but can be quite verbose. What alternatives are there and are they specialised for any particular purpose? Library support to interrogate the contents easily is a big plus point.


回答1:


There seems to be a lot of multi-platform support for JSON.




回答2:


Jeff's article on The Angle Bracket Tax summarizes a number of alternatives (well, mainly YAML), and led me to the wiki article on lightweight markup languages.

Update: Although YAML is a possible "alternative to XML" for some applications, the two are not, as I first thought, isomorphic.

Indeed, it "ain't markup language."

Furthermore, YAML ain't as "lightweight" as it appears. For documents that can be represented in plain XML (such as Jeff's example), YAML is clearly less verbose. But YAML offers many other specialized structures, enlisting many more characters and sequences than are reserved by XML.

Bottom line, if you're looking for XML-without-angle-brackets, YAML ain't it.




回答3:


Don't forget about YAML!

JSON seems to have better support though. For example, the Prototype JS library has excellent built-in JSON functions.




回答4:


I wouldn't dismiss plain text, like CSV or tab-delimited.




回答5:


HDF5 is a very compact data format with some characteristics that are similar to xml. The .net libraries leave a lot to be desired, but the format scales very well both in terms of size and performance.




回答6:


My work with XML is almost exclusively with document-centric XML, which must model long sequences of arbitrarily nested structures. I haven't used JSON yet, but my impression is that it is cumbersome to use with document-like data, but well-adapted and even elegant for use with record-like data. Consider the shape of your data when making your decision.




回答7:


You could try google's protobufs. It's much faster than XML. There are libraries for it in C, C++, C#, Java and Python (there are alpha versons for ruby and perl). But it is binary.




回答8:


S-Expressions work great if you don't need to apply attributes to elements. Another alternative is YAML.




回答9:


XML is often used for configuration, and in this case there are some other simple storage formats that are often used (less document oriented):

  1. .property files
  2. INI files

There's various ways for reading and writing both, depending on platform and language.




回答10:


What do you want to do with the data? Store it? Pass it around? Display it? These questions should drive your search for an appropriate technology. Simply asking how you should format your data is like asking what language you should program in, without specifying what you want to accomplish.

For most data tasks, well Dr. Codd has the cure: http://en.wikipedia.org/wiki/Edgar_F._Codd. Databases should be able to do just about anything you have in mind.

If you're passing it around, I advocate plain text. When you roll your own binary format your data goes away when your parser goes away.

With plain text, the deeper question is where to put the metadata. Should it be external to the data file, or internal ("self-describing").

For example, XML is plain text, but so is source code. With a source file, there is a specification that goes in to great detail as to the syntax and semantics, while XML is supposed to be self-describing. The problem is that it isn't. Furthermore it evolved right out of document presentation and markup, but is now being abused for all sorts of data serialization, transfer, and storage.




回答11:


TOML is the new big thing. It has the niceness of YAML without the big spec. It extends a common and familiar configuration file format. It is directly analogous to (and translatable to) JSON. Has support in all the big languages. Created by Github co-founder/president Tom and narcissistically named. Its awesome. Give it a shot!

Sample TOML:

# This is a TOML document. Boom.

title = "TOML Example"

[owner]
name = "Tom Preston-Werner"
organization = "GitHub"
bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
dob = 1979-05-27T07:32:00Z # First class dates? Why not?

[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

  # You can indent as you please. Tabs or spaces. TOML don't care.
  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
  "alpha",
  "omega"
]



回答12:


If someone looking up less verbose alternative to XML, which is more or less isomorphic to XML, then there is AXON. In order to explain consider examples of equivalent representations in both XML and AXON. There is also python library pyaxon that support AXON format.

XML

<person>
   <name>Alex</name>
   <age>34</age>
   <email>mail@example.com</email>
</person>

AXON

person {
  name {"Alex"}
  age {34}
  email {"mail@example.com"}}

XML

<memo date="2008-02-14">
<from>
<name>The Whole World</name><email>us@world.org</email>
</from>
<to>
<name>Dawg</name><email>dawg158@aol.com</email>
</to>
<message>
Dear sir, you won the internet. http://is.gd/fh0
</message>
</memo>

AXON

memo {
  date:2008-02-14
  from {
    name{"The Whole World"} email{"us@world.org"}}
  to {
    name{"Dawg"} email{"dawg158@aol.com"}}
  message {"Dear sir, you won the internet. http://is.gd/fh0"}
}

XML

<club>
  <players>
    <player id="kramnik"
       name="Vladimir Kramnik"
       rating="2700"
       status="GM" />
    <player id="fritz"
       name="Deep Fritz"
       rating="2700"
       status="Computer" />
    <player id="mertz"
      name="David Mertz"
      rating="1400"
      status="Amateur" />
  </players>
  <matches>
    <match>
      <Date>2002-10-04</Date>
      <White refid="fritz" />
      <Black refid="kramnik" />
      <Result>Draw</Result>
    </match>
    <match>
      <Date>2002-10-06</Date>
      <White refid="kramnik" />
      <Black refid="fritz" />
      <Result>White</Result>
    </match>
  </matches>
</club>

AXON

club {
  players {
    player {
      id:"kramnik"
      name:"Vladimir Kramnik"
      rating:2700
      status:"GM"}
    player {
      id:"fritz"
      name:"Deep Fritz"
      rating:2700
      status:"Computer"}
    player {
      id:"mertz"
      name:"David Mertz"
      rating:1400 
      status:"Amateur"}}
  matches {
    match {
     Date{2002-10-04}
     White{refid:"fritz"}
     Black{refid:"kramnik"}
     Result{"Draw"}}
    match {
      Date{2002-10-06}
      White{refid:"kramnik"}
      Black{refid:"fritz"}
      Result{"White"}}}}



回答13:


For the sake of completeness I will mention Edifact for which I wrote an interface a long time ago.




回答14:


I wouldn't dismiss plain text, like CSV or tab-delimited.

I'm really looking for alternatives that have a defined structure and (cross platform, multi language) library support. I'm interested in looking at different designs and their pros and cons. I like the idea of formats that can have a text and "binary" (compact, "compiled", fast I/O, smaller footprint) format. The advantage of having libraries is that they perform the parsing and perhaps extra data manipulation/validation for you.

Although having said that, there is definitely a use for simple formats like .ini, .plist and CSV etc. You shouldn't always have to use a hammer to crack a nut.




回答15:


But at what cost?

I'm all for JSON in many situations, especially where weight or client-side work is a concern, but moving away from XML loses readability (so important in those config files) and the power of tomorrow's problem solutions like XSLT and XPath. Be really sure why and when you move away: it's a de facto standard for a reason.

(aside: my habit is to use XML internally, and transform that to JSON where that's the desired output)




回答16:


Heresy! XML is king of data. Say no to the usurpers, off with their heads! Long live XML!

But seriously if just need data use Json, for support and elegance, but if you need formating ,xpath like queries, additional metadata, etc... Stick with XML

Note: I use Xml for configs system building code generation and similar tasks, but Json for Rpc,Sql for queries and persistency, and finally Yaml here and there for logging and quick tasks, in other words choose the appriopiate format for the need.




回答17:


Simple Declarative Language is a nice alternative to XML for common tasks such as serialization and configuration. It provides a C# and Java parser library. I think it excels at specifying all kinds of metadata without the XML verbosity.




回答18:


JSON is valid YAML which could be very useful. Two for one!




回答19:


If you're asking in the perspective of a DSL, Guile Scheme could help, as already suggested with the S-expressions.

Personally I also use JSON for AJAX transactions.




回答20:


XML is OK for text markup, but for general structures serialization is a quite bad option, where JSON is much more suited.




回答21:


Anything you like, as long as it's not ASN.1




回答22:


JSON can be used in many ways, but it is particularly well suited to use with MySQL tables I find. It works very well with Android as well (GSON library or JSON). Beyond that, it's effective at transmitting small bits of data individually or as arrays.




回答23:


For storing code-like data, LES (Loyc Expression Syntax) is a budding alternative. I've noticed a lot of people use XML for code-like constructs, such as build systems which support conditionals, command invocations, sometimes even loops. These sorts of things look natural in LES:

// LES code has no built-in meaning. This just shows what it looks like.
[DelayedWrite] // an "attribute"
Output(
    if version > 4.0 {
        $ProjectDir/Src/Foo;
    } else {
        $ProjectDir/Foo;
    }
);

It doesn't have great tool support yet, though; currently the only LES library is for C#. Currently only one app is known to use LES: LLLPG.

In theory you could use LES for data or markup, but there are no standards for how to do that:

body {
    '''Click here to use the World's '''
    a href="http://google.com" {
        strong "most popular"; " search engine!"
    };
};

point = (2, -3);
tasteMap = { "lemon" -> sour; "sugar" -> sweet; "grape" -> yummy };



回答24:


For the sake of mentioning... have a look at my proposal:

http://igagis.github.io/stob/

It is very simple and is not overloadad with variety of special symbols, just {} and "" basically.

Supports C++ style comments.

There are C++, C# and Java libraries.

Example:

"String object"
AnotherStringObject
"String with children"{
    "child 1"
    Child2
    "child three"{
        SubChild1
        "Subchild two"

        Property1 {Value1}
        "Property two" {"Value 2"}
        //comment

        /* multi-line
           comment */

        "multi-line
         string"

        "Escape sequences \" \n \r \t \\"
    }
}


来源:https://stackoverflow.com/questions/44207/what-are-good-alternative-data-formats-to-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!