Compression XML metrics .

我怕爱的太早我们不能终老 提交于 2019-12-10 21:35:27

问题


I have a client server application that sends XML over TCP/IP from client to server and then broadcast out to other clients. How do i know at what the minimun size of the XML that would warrant a performance improvement by compression the XML rather than sending over the regular stream.

Are there any good metrics on this or examples?


回答1:


Xml usually compresses very well, as it tends to have a lot of repetition.

Another option would be to swap to a binary format; BinaryFormatter or NetDataContractSerializer are simple options, but both are notoriously incompatible (for example with java) compared with xml.

Another option would be a portable binary format such as google's "protocol buffers". I maintain a .NET/C# version of this called protobuf-net. This is designed to be side-by-side compatible with regular .NET approaches (such as XmlSerializer / DataContractSerializer), but is much smaller than xml, and requires significantly less processing (CPU etc) for both serialization and deserialization.

This page shows some numbers for XmlSerializer, DataContractSerializer and protobuf-net; I thought it included stats with/without compression, but they seem to have vanished...

[update] I should have said - there is a TCP/IP example in the QuickStart project.




回答2:


A loose metric would be to compress anything larger than a single packet, but that's just nitpicking.

There is no reason to refrain from using a binary format internally in your application - no matter how much time compression will take, the network overhead will be several orders of magnitude slower than compressing (unless we're talking about very slow devices).

If these two suggestions don't put you at ease, you can always benchmark to find the spot to compress at.




回答3:


By all means compress it always.

It will save you bandwidth for anything with more then 2 tags.




回答4:


To decide if compression has any benefit for you, you need to run some tests using actual or expected amount of the kind of data expect will flow through your system.

Hope this helps.




回答5:


In the tests that we did, we found a huge benefit, however be aware about the CPU implications.

On one project that I worked on we were sending over large amounts of XML data (> 10 meg) to clients running .NET. (I'm not recommending this as a way to do things, it's just the situation we found ourselves in!!) We found that as XML files got sufficiently large the Microsoft XML libraries were unable to parse the XML files (the machines ran out of memory, even on machines > 1 gig). Changing the XML parsing libraries eventually helped, but before we did that we enabled GZIP compression on the data we transferred which helped us parse the large documents. On our two linux based websphere servers we were able to generate the XML and then gzip it fairly easily. I think that with 50 users doing this concurrently (loading about 10 to 20 of these files) we were able to do this ok, with about 50% cpu. The compression of the XML seemed to be better handled (i.e. parsing/cpu time) on the servers than on the .net gui's, but this was probably due to the above inadequacies of the Microsoft XML libraries being used. As I mentioned, there are better libraries available that are faster and use less memory.

In our case, we got massive improvements in size too -- we were compressing 50 meg XML files in some cases down to about 10 meg. This obviously helped out network performance too.

Since we were concerned about the impact, and whether this would have other consequences (our users seemed to do things in large waves, so we were concerned we'd run out of CPU) we had a config variable which we could use to turn gzip on/off. I'd recommend that you do this too.

Another thing: we also zipped XML files before persisting them in databases, and this saved about 50% space (XML files ranging from a few K to a few meg, but mostly fairly small). It's probably easier to do everything than choose a specific level to differentiate when to use compression or not.



来源:https://stackoverflow.com/questions/236496/compression-xml-metrics

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!