Determining the serialized size of a .NET type and unmanaged memory efficiency

问题

My question is whether it is possible to determine the serialized size (in bytes) of a reference type.

Heres the situation:

I am using the BinaryFormatter class to serialize basic .NET types, ie for instance:

[Serializable]
public class Foo
{
    public string Foo1 { get; set; }
    public string Foo2 { get; set; } 
}

I am serializing each item to a byte[], then adding that segment to the end of an existing byte[] and additionally adding a carriage return at the end of each segment to delimit the objects.

In order to deserialize I use Marshal.ReadByte() as follows:

List<byte> buffer = new List<byte>();

for (int i = 0; i < MapSize; i++)
{
    byte b = Marshal.ReadByte(readPtr , i); 

    if (b != delim)  // read until encounter a carriage return 
        buffer.Add(b);
    else
        break;
}

readPtr = readPtr + buffer.Count + 1; // incrementing the pointer for the next object

return buffer.ToArray();

I believe that using Marshal.Copy() would be more efficient but I need to know the length of the serialized byte segment in advance. Is there a way I can reliably compute this from the type thats being serialized, or an overall more efficient method I can use?

Also, the use of a carriage return won't be reliable, ultimately. So I am wondering if there is a more standard way to delimit the objects, either through customizing my BinaryFormatter or using some other standardized best practice? For instance is there a specific way that the BinaryFormatter delimits objects if its serializing say, a generic List<>?

回答1:

There isn't a terribly good way to determine the serialized length beforehand. The specification for the BinaryFormatter protocol is available here: http://msdn.microsoft.com/en-us/library/cc236844(v=prot.10).aspx

I'll save you the trouble of reading it for your purposes:

It's built to be an extensible format. This allows you to add fields later and still maintain some compatibility with earlier implementations. For your purposes, this means that the length of the serialized form is not fixed in time.
It's extremely fragile. The binary format actually encodes the names of the fields in it. If you ever rename a field, the length of the serialized form will change.
The binary format actually encompasses a many-to-one relationship between serialized encodings and object data. The same object could potentially be encoded in a number of different ways, with a number of different byte counts for the output (I won't get into why it's written that way).

If you want an easy way to do things, just create an array that contains all the objects and serialize that single array. This solves most of your problems. All the issues of delimiting the different objects are handled by the BinaryFormatter. You won't have excessive memory copying. The final output will be more compact because the BinaryFormatter only has to specify the field names once per invocation.

Finally, I can tell you that the extra memory copy is not the main source of inefficiency in your current implementation. You're getting far more inefficiency from the BinaryFormatter's use of reflection, and the fact that it encodes the field names in the serialized output.

If efficiency is paramount, then I would suggest writing some custom code that encodes the contents of your structures in "plain old data" format. Then you'll have control over how much gets written and how.

回答2:

Using a byte as delimiter for binary serialized data is awful idea - 13 is perfectly valid value that can be part of serialized data, not just your "delimiter".

Prefix each block with size in bytes instead and read it in blocks.

回答3:

You can use Marshal.SizeOf to get a struct's native size. This only works for structs and I advise you to set the StructLayout attribute.

I'll pull up some information from the comments because it is surprising yet important:

The CLR has metadata facilities for making the native layout of a struct or class fixed. In C# this is only possible for structs. But classes can be used that way too.

You can bit-blit a managed type into bytes iff you specify SequentialLayout. http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute.aspx This facility is not well known but it exists, is specified and supported. Quote: "The class layout attributes (AutoLayout, SequentialLayout and ExplicitLayout) define how the fields of the class instance are laid out in memory."

Look at the System.Reflection.TypeAttributes enum. It defines other CLR-level attributes as well. C# does not give access to them but ilasm.exe does.

回答4:

I could find the cause of not serializing at all using this code from https://bytes.com/topic/c-sharp/answers/238927-object-size-memory

var m = new System.IO.MemoryStream();
var b = new
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
b.Serialize(m, Obj);
var size = Convert.ToDouble(m.Length);

来源：https://stackoverflow.com/questions/10148391/determining-the-serialized-size-of-a-net-type-and-unmanaged-memory-efficiency

标签

.net

serialization

pointers

unmanaged