Why is binary serialization considered faster than xml serialization?
I had assumed binary serialization to be faster than xml (based on how verbose xml can be). However I have an opposite observation! I was investigating a performance issue in one of my application and find out that time to serialize is similar between xml and binary. However difference in time to deserialization is extremely huge. xml deserialization takes less than 10 seconds but binary deserialization takes over 10 minutes!
So I guess in theory xml serialization/deseriliaztion is slower than binary but in your application, it depends!
I can't share the actual data but here are the results (in milliseconds)
Serialization Deserialization XML Binary XML Binary 7,956 9,535 9,112 668,918 7,608 9,105 8,386 670,445 7,583 9,398 8,372 676,190 7,656 9,299 9,783 679,117 7,454 9,458 8,219 669,626
Consider serializing double for example:
binary serialization: writing 8 bytes from memory address to the stream
binary deserialization: reading same 8 bytes
xml serialization: writing tag, converting to text, writing closing tag - nearly thrice the I/O and 1000x more CPU utilization
xml deserialization: tag reading/validation, reading string parsing it to number, reading/validation of closing tag. little more overhead for I/O and some more for CPU
Actually, like all things - it depends on the data, and the serializer.
Commonly (although perhaps unwisely) people mean BinaryFormatter for "binary", but this has a number of foibles:
Conversely, xml generally has overheads such as:
Of course, xml is easily compressed, adding CPU but hugely reducing bandwidth.
But that doesn't mean one is faster; I would refer you to some sample stats from here (with full source included), to which I've annotated the serializer base (binary, xml, text, etc). Look in particular at the first two results; it looks like XmlSerializer trumped BinaryFormatter on every value, while retaining the cross-platform advantages. Of course, protobuf then trumps XmlSerializer ;p
These numbers tie in quite well to ServiceStack's benchmarks, here.
BinaryFormatter *** binary
Length: 1314
Serialize: 6746
Deserialize: 6268
XmlSerializer *** xml
Length: 1049
Serialize: 3282
Deserialize: 5132
DataContractSerializer *** xml
Length: 911
Serialize: 1411
Deserialize: 4380
NetDataContractSerializer *** binary
Length: 1139
Serialize: 2014
Deserialize: 5645
JavaScriptSerializer *** text (json)
Length: 528
Serialize: 12050
Deserialize: 30558
(protobuf-net v2) *** binary
Length: 112
Serialize: 217
Deserialize: 250
Binary serialization is more efficient because write raw data directly and the XML needs format, and parse the data to generate a valid XML structure, additionally depending of what sort of data have your objects the XML may have a lot of redundant data.
Well, first of all, XML is a bloated format. Every byte you send in binary form would be similar to at least 2 or 3 bytes in XML. For example, sending the number "44" in binary, you need just one byte. In XML you need an element tag, plus two bytes to put the numer: <N>44</N> which is a lot more data.
One difference is the encoding/decoding time required to handle the message. Since binary data is so compact, it won't eat up much clock cycles. If the binary data is a fixed structure, you could probably load it directly into memory and access every element from it without the need to parse/unparse the data.
XML is a text-based format which needs a few more steps to be processed. First, the format is bloated so it eats up more memory. Furthermore, all data is text and you might need them in binary form, thus the XML needs to be parsed. This parsing still needs time to process, no matter how fast your code is. ASN.1 is a "binary XML" format that provides a good alternative for XML, but which will need to be parsed just like XML. Plus, if most of the data you use is text, not numeric, then binary formats won't make a big difference.
Another speed factor is the total size of your data. When you just load and save a binary file of 1 KB or an XML file of 3 KB then you probably won't notice any speed difference. This is because disks use blocks of a specific size to store data. Up to 4 KB fits easily within most disk blocks. Thus, for the disk it doesn't matter if it needs to read 1 KB or 3 KB since it reads the whole 4KB block. But when the binary file is 1 megabyte and the XML is 3 megabytes, the disk will need to read a lot more blocks to just read the XML. (Or to write it.) And then it even matters if your XML is 3 MB or just 2.99 MB or 3.01 MB.
With transport over TCP/IP, most binary data will be UU-encoded. With UU-encoding, your binary data will grow with 1 byte for every 3 bytes in the data. XML data will not be encoded thus the size difference becomes smaller, thus the speed difference becomes less. Still, the binary data will still be faster since the encoding/decoding routines can be real fast.
Basically, size matters. :-)
But with XML you have an additional alternative. You can send and store the XML in a ZIP file format. Microsoft Office does this with it's newer versions. A Word document is created as an XML file, yet stored as part of a bigger ZIP file. This combines the best of both worlds, since Word documents are mostly text thus a binary format would not add much speed increase. Zipping the XML makes storage and sending the data a lot faster simply by making it binary. Even more interesting, a compressed XML file could end up being smaller than a non-compressed binary file, thus the zipped XML becomes the faster one. (But it's cheating since the XML is now binary...)