How do ValueTypes derive from Object (ReferenceType) and still be ValueTypes?

后端 未结 6 1867
闹比i
闹比i 2020-11-22 17:20

C# doesn\'t allow structs to derive from classes, but all ValueTypes derive from Object. Where is this distinction made?

How does the CLR handle this?

6条回答
  •  星月不相逢
    2020-11-22 17:46

    Rationale

    Of all the answers, @supercat's answer comes closest to the actual answer. Since the other answers don't really answer the question, and downright make incorrect claims (for example that value types inherit from anything), I decided to answer the question.

     

    Prologue

    This answer is based on my own reverse engineering and the CLI specification.

    struct and class are C# keywords. As far as the CLI is concerned, all types (classes, interfaces, structs, etc.) are defined by class definitions.

    For example, an object type (Known in C# as class) is defined as follows:

    .class MyClass
    {
    }
    

     

    An interface is defined by a class definition with the interface semantic attribute:

    .class interface MyInterface
    {
    }
    

     

    What about value types?

    The reason that structs can inherit from System.ValueType and still be value types, is because.. they don't.

    Value types are simple data structures. Value types do not inherit from anything and they cannot implement interfaces. Value types are not subtypes of any type, and they do not have any type information. Given a memory address of a value type, it's not possible to identify what the value type represents, unlike a reference type which has type information in a hidden field.

    If we imagine the following C# struct:

    namespace MyNamespace
    {
        struct MyValueType : ICloneable
        {
            public int A;
            public int B;
            public int C;
    
            public object Clone()
            {
                // body omitted
            }
        }
    }
    

    The following is the IL class definition of that struct:

    .class MyNamespace.MyValueType extends [mscorlib]System.ValueType implements [mscorlib]System.ICloneable
    {
        .field public int32 A;
        .field public int32 B;
        .field public int32 C;
    
        .method public final hidebysig newslot virtual instance object Clone() cil managed
        {
            // body omitted
        }
    }
    

    So what's going on here? It clearly extends System.ValueType, which is an object/reference type, and implements System.ICloneable.

    The explanation is, that when a class definition extends System.ValueType it actually defines 2 things: A value type, and the value type's corresponding boxed type. The members of the class definition define the representation for both the value type and the corresponding boxed type. It is not the value type that extends and implements, it's the corresponding boxed type that does. The extends and implements keywords only apply to the boxed type.

    To clarify, the class definition above does 2 things:

    1. Defines a value type with 3 fields (And one method). It does not inherit from anything, and it does not implement any interfaces (value types can do neither).
    2. Defines an object type (the boxed type) with 3 fields (And implementing one interface method), inheriting from System.ValueType, and implementing the System.ICloneable interface.

    Note also, that any class definition extending System.ValueType is also intrinsically sealed, whether the sealed keyword is specified or not.

    Since value types are just simple structures, don't inherit, don't implement and don't support polymorphism, they can't be used with the rest of the type system. To work around this, on top of the value type, the CLR also defines a corresponding reference type with the same fields, known as the boxed type. So while a value type can't be passed around to methods taking an object, its corresponding boxed type can.

     

    Now, if you were to define a method in C# like

    public static void BlaBla(MyNamespace.MyValueType x),

    you know that the method will take the value type MyNamespace.MyValueType.

    Above, we learned that the class definition that results from the struct keyword in C# actually defines both a value type and an object type. We can only refer to the defined value type, though. Even though the CLI specification states that the constraint keyword boxed can be used to refer to a boxed version of a type, this keyword doesn't exist (See ECMA-335, II.13.1 Referencing value types). But lets imagine that it does for a moment.

    When refering to types in IL, a couple of constraints are supported, among which are class and valuetype. If we use valuetype MyNamespace.MyType we're specifying the value type class definition called MyNamespace.MyType. Likewise, we can use class MyNamespace.MyType to specify the object type class definition called MyNamespace.MyType. Which means that in IL you can have a value type (struct) and an object type (class) with the same name and still distinguish them. Now, if the boxed keyword noted by the CLI specification was actually implemented, we'd be able to use boxed MyNamespace.MyType to specify the boxed type of the value type class definition called MyNamespace.MyType.

    So, .method static void Print(valuetype MyNamespace.MyType test) cil managed takes the value type defined by a value type class definition named MyNamespace.MyType,

    while .method static void Print(class MyNamespace.MyType test) cil managed takes the object type defined by the object type class definition named MyNamespace.MyType.

    likewise if boxed was a keyword, .method static void Print(boxed MyNamespace.MyType test) cil managed would take the boxed type of the value type defined by a class definition named MyNamespace.MyType.

    You'd then be able to instantiate the boxed type like any other object type and pass it around to any method that takes a System.ValueType, object or boxed MyNamespace.MyValueType as an argument, and it would, for all intents and purposes, work like any other reference type. It is NOT a value type, but the corresponding boxed type of a value type.

     

    Summary

    So, in summary, and to answer the question:

    Value types are not reference types and do not inherit from System.ValueType or any other type, and they cannot implement interfaces. The corresponding boxed types that are also defined do inherit from System.ValueType and can implement interfaces.

    A .class definition defines different things depending on circumstance.

    • If the interface semantic attribute is specified, the class definition defines an interface.
    • If the interface semantic attribute is not specified, and the definition does not extend System.ValueType, the class definition defines an object type (class).
    • If the interface semantic attribute is not specified, and the definition does extend System.ValueType, the class definition defines a value type and its corresponding boxed type (struct).

    Memory layout

    This section assumes a 32-bit process

    As already mentioned, value types do not have type information, and thus it's not possible to identify what a value type represents from its memory location. A struct describes a simple data type, and contains just the fields it defines:

    public struct MyStruct
    {
        public int A;
        public short B;
        public int C;
    }
    

    If we imagine that an instance of MyStruct was allocated at address 0x1000, then this is the memory layout:

    0x1000: int A;
    0x1004: short B;
    0x1006: 2 byte padding
    0x1008: int C;
    

    Structs default to sequential layout. Fields are aligned on boundaries of their own size. Padding is added to satisfy this.

     

    If we define a class in the exact same way, as:

    public class MyClass
    {
        public int A;
        public short B;
        public int C;
    }
    

    Imagining the same address, the memory layout is as follows:

    0x1000: Pointer to object header
    0x1004: int A;
    0x1008: int C;
    0x100C: short B;
    0x100E: 2 byte padding
    0x1010: 4 bytes extra
    

    Classes default to automatic layout, and the JIT compiler will arrange them in the most optimal order. Fields are aligned on boundaries of their own size. Padding is added to satisfy this. I'm not sure why, but every class always has an additional 4 bytes at the end.

    Offset 0 contains the address of the object header, which contains type information, the virtual method table, etc. This allows the runtime to identify what the data at an address represents, unlike value types.

    Thus, value types do not support inheritance, interfaces nor polymorphism.

    Methods

    Value types do not have virtual method tables, and thus do not support polymorphism. However, their corresponding boxed type does.

    When you have an instance of a struct and attempt to call a virtual method like ToString() defined on System.Object, the runtime has to box the struct.

    MyStruct myStruct = new MyStruct();
    Console.WriteLine(myStruct.ToString()); // ToString() call causes boxing of MyStruct.
    

    However, if the struct overrides ToString() then the call will be statically bound and the runtime will call MyStruct.ToString() without boxing and without looking in any virtual method tables (structs don't have any). For this reason, it's also able to inline the ToString() call.

    If the struct overrides ToString() and is boxed, then the call will be resolved using the virtual method table.

    System.ValueType myStruct = new MyStruct(); // Creates a new instance of the boxed type of MyStruct.
    Console.WriteLine(myStruct.ToString()); // ToString() is now called through the virtual method table.
    

    However, remember that ToString() is defined in the struct, and thus operates on the struct value, so it expects a value type. The boxed type, like any other class, has an object header. If the ToString() method defined on the struct was called directly with the boxed type in the this pointer, when trying to access field A in MyStruct, it would access offset 0, which in the boxed type would be the object header pointer. So the boxed type has a hidden method that does the actual overriding of ToString(). This hidden method unboxes (address calculation only, like the unbox IL instruction) the boxed type then statically calls the ToString() defined on the struct.

    Likewise, the boxed type has a hidden method for each implemented interface method that does the same unboxing then statically calls the method defined in the struct.

     

    CLI specification

    Boxing

    I.8.2.4 For every value type, the CTS defines a corresponding reference type called the boxed type. The reverse is not true: In general, reference types do not have a corresponding value type. The representation of a value of a boxed type (a boxed value) is a location where a value of the value type can be stored. A boxed type is an object type and a boxed value is an object.

    Defining value types

    I.8.9.7 Not all types defined by a class definition are object types (see §I.8.2.3); in particular, value types are not object types, but they are defined using a class definition. A class definition for a value type defines both the (unboxed) value type and the associated boxed type (see §I.8.2.4). The members of the class definition define the representation of both.

    II.10.1.3 The type semantic attributes specify whether an interface, class, or value type shall be defined. The interface attribute specifies an interface. If this attribute is not present and the definition extends (directly or indirectly) System.ValueType, and the definition is not for System.Enum, a value type shall be defined (§II.13). Otherwise, a class shall be defined (§II.11).

    Value types do not inherit

    I.8.9.10 In their unboxed form value types do not inherit from any type. Boxed value types shall inherit directly from System.ValueType unless they are enumerations, in which case, they shall inherit from System.Enum. Boxed value types shall be sealed.

    II.13 Unboxed value types are not considered subtypes of another type and it is not valid to use the isinst instruction (see Partition III) on unboxed value types. The isinst instruction can be used for boxed value types, however.

    I.8.9.10 A value type does not inherit; rather the base type specified in the class definition defines the base type of the boxed type.

    Value types do not implement interfaces

    I.8.9.7 Value types do not support interface contracts, but their associated boxed types do.

    II.13 Value types shall implement zero or more interfaces, but this has meaning only in their boxed form (§II.13.3).

    I.8.2.4 Interfaces and inheritance are defined only on reference types. Thus, while a value type definition (§I.8.9.7) can specify both interfaces that shall be implemented by the value type and the class (System.ValueType or System.Enum) from which it inherits, these apply only to boxed values.

    The non-existent boxed keyword

    II.13.1 The unboxed form of a value type shall be referred to by using the valuetype keyword followed by a type reference. The boxed form of a value type shall be referred to by using the boxed keyword followed by a type reference.

    Note: The specification is wrong here, there is no boxed keyword.

    Epilogue

    I think part of the confusion of how value types seem to inherit, stems from the fact that C# uses casting syntax to perform boxing and unboxing, which makes it seem like you're performing casts, which is not really the case (although, the CLR will throw an InvalidCastException if attempting to unbox the wrong type). (object)myStruct in C# creates a new instance of the boxed type of the value type; it does not perform any casts. Likewise, (MyStruct)obj in C# unboxes a boxed type, copying the value part out; it does not perform any casts.

提交回复
热议问题