How does a struct instance's virtual method get located using its type object in heap?

问题

below is a code example from a book to show when a value type will be boxed:

internal struct Point 
{
   private readonly Int32 m_x, m_y;
   public Point(Int32 x, Int32 y) {
      m_x = x;
      m_y = y;
   }
   
   //Override ToString method inherited from System.ValueType
   public override string ToString() {
      return String.Format("({0}, {1})", m_x.ToString(), m_y.ToString());
   }
}

class Program
{
    static void Main(string[] args) {
       Point p1 = new Point(10, 10);
       p1.ToString();       
    }
}

and the author says:

In the call to ToString, p1 doesn’t have to be boxed. At first, you’d think that p1 would have to be boxed because ToString is a virtual method that is inherited from the base type, System.ValueType. Normally, to call a virtual method, the CLR needs to determine the object’s type in order to locate the type’s method table. Because p1 is an unboxed value type, there’s no type object pointer. However, the just-in-time (JIT) compiler sees that Point overrides the ToString method, and it emits code that calls ToString directly (nonvirtually) without having to do any boxing. The compiler knows that polymorphism can’t come into play here because Point is a value type, and no type can derive from it to provide another implementation of this virtual method.

I kind of get what it means, because Point overrides ToString from System.ValueType, CLR doesn't need to check the type object to locate the type’s method table, the compiler can emits IL code that calls ToString directly. Fair enough.

But let's say p1 also calls GetHashCode from System.ValueType as:

class Program
{
    static void Main(string[] args) {
       Point p1 = new Point(10, 10);
       p1.ToString();  
       p1.GetHashCode();     
    }
}

since Point struct doesn't override GetHashCode() from System.ValueType, then compiler cannot emit IL codes directly this time and CLR needs to location the type’s method table to look up GetHashCode method, but as the author says p1 is an unboxed value type, there’s no type object pointer, so how can the CLR look up the GetHashCode method in Point struct's type object in heap?

回答1:

If we look at the generate MSIL, we see the following:

IL_0000:  ldloca.s    00 // p1
IL_0002:  ldc.i4.s    0A 
IL_0004:  ldc.i4.s    0A 
IL_0006:  call        System.Drawing.Point..ctor
IL_000B:  ldloca.s    00 // p1
IL_000D:  constrained. System.Drawing.Point
IL_0013:  callvirt    System.Object.ToString
IL_0018:  pop         
IL_0019:  ldloca.s    00 // p1
IL_001B:  constrained. System.Drawing.Point
IL_0021:  callvirt    System.Object.GetHashCode
IL_0026:  pop

Let's look up ECMA-335 Part III.2.1 on constrained.:

The constrained. prefix is permitted only on a callvirt instruction. The type of ptr must be a managed pointer (&) to thisType. The constrained prefix is designed to allow callvirt instructions to be made in a uniform way independent of whether thisType is a value type or a reference type.

If thisType is a value type and thisType implements method then
ptr is passed unmodified as the ‘this’ pointer to a call of method implemented by thisType

If thisType is a value type and thisType does not implement method then
ptr is dereferenced, boxed, and passed as the ‘this’ pointer to the callvirt of method

This last case can only occur when method was defined on System.Object, System.ValueType, or System.Enum and not overridden by thisType. In this last case, the boxing causes a copy of the original object to be made, however since all methods on System.Object, System.ValueType, and System.Enum do not modify the state of the object, this fact cannot be detected.

So, yes, this does cause boxing, but only when there is no override, because System.Object methods expect a class, not a valuetype. But if it is overridden, then the this pointer of the method must be a managed pointer, the same as any other valuetype method.

回答2:

Sorry in advance for my poor English and the errors or uncertainties that I would have certainly written, these things are far away, and I am not so advanced in IL/CLR/CTS... Also when I will talk about 808x, it is where I came from in addition to the 6809, and it is to simplify the things about RISC and CISC histories. I tried to do my best to draw a portrait of the painting that allows us to create our music, and open up research paths.

Such questions about code and data, stack and heap, classes and structs, and so on, is a very interesting and fundamental, but complex and tough, wide and broad, subject: it is one big root of the modern computing technology based on transistors and silicon integrated circuit for our microprocessors's based computers, servers, smartphones... and more and more in general any device having electronic components.

They are about how CPUs underlying work, no matter the high-level constructed over low-level, OOP invented over non-OOP, structured created over non-structured, functionnal mimed over procedural, technology we use.

We don't have .NET-ready microprocessors yet, then in addition to information given in:

What and where are the stack and heap?

Stack and heap in c sharp

Stack and Heap allocation

Memory allocation: Stack vs Heap?

How does the heap and stack work for instances and members of struct in C#?

Does structs have type objects created in heap?

Why methods return just one kind of parameter in normal conditions?

How would the memory look like for this object?

Fundamentally, code of methods of classes and structs is not located nor allocated in the heap or in the stack. Basically, the native code, the instructions, is loaded in the CODE SEGMENT (for x86 and x32, I did not take a look at x64) of the memory reserved for the processus in addition to the 2GB of DATA SEGMENT (I hope I don't say a mistake, that's old...) when we click on the .exe file and they load some DLLs.

In other words: the code of the implementation of methods is loaded from the binary files EXE and DLLs when the process is started, and is stored in the CODE SEGMENT, as all data, static (literals) and dynamic (instances) are in the DATA SEGMENT, even translated by the JIT, I suppose, or something like that, if things had not changed since x32 and protected mode. Non-virtual methods' tables as well as virtual methods' tables are not stored in the data segment for each instance of objects. I don't remember details but these tables is for code.

Also, a class that inherits a class inherits members data and behaviors in concepts. It's an architecture, a plan, a draw. When we create an object, you get all data in one place. We don't have several objects per object for each class in the hierarchy: it is just a conceptual design in our minds and in the soure code.

Where in memory is vtable stored?

Also, data of each instance of an object is a projection from its definition as well as its ancestors, in one place, one full instance. References being "pointers" to relevant other spaces in the data segment. Imagine an object being a wagoon in a train (the memory): literals and references are chairs, and references (classes as well as structs, particular case of class, or the contrary) points to another wagoon.

Memory segmentation

x86 memory segmentation

The stack being what I write here and explained here: a cupboard (stack) in a big room (heap), and this stack place is not accessed using standard memory access using slow MOVs but the fast CPU stack registers, so it is faster, but space limited.

.NET IL code is translated in Machine dependent Code, thus on our technology, Intel-like or ARM I suppose, for example, all CPU are the same in a certain manner (it's the silicon tech), it is the same as what we learnt in C or x86 ASM... DotNet is a virtual machine. The CLR executing IL code translate it is INTEL-like code, to say things shortly. That's all. CPU registers are Physical CPU Registers and they can't be another thing even if we wanted something else than that. And CPU Stack is CPU tack. And so on.

If one day we have a new computers generation (after the current silicon since about 8086/8088), things may change, and .NET will generate different code, like for example with Quantum Computing or DNA computing. Thus, any question about .NET and CLR will actually and finally get a standard and classic 808x answer as .NET does not change how the CPU registers works nor bus sizes nor all the rest.

I saw some helpers knowing well these things in details about .NET specs in certain cases about certain questions on certain subjects, answering that "it is not guaranteed", because the documentation let the door open to a new generation and say nothing to be able to generate a different machine code on a new generation of computers...

The CLR does nothing than translate virtual code to real machine code for the targetted architecture. The CLR is not a real machine, it is a virtual machine. This virtual machine can't exist and work in the real world as is on a CPU. There is no thing such a real .NET machine. It does not exist yet and I hope plaone day it will exist, this will be great (create a .NET CPU is one of Microsoft's plan since many years, as I know).

In other words: no .NET CPU exists yet. So, all .NET tech is translated in x86/x32/i64 tech... That's all. Thus, all questions about stack, heap, methods, classes, structs are not answerable with answers for such imaginary .NET-ready CPU being a new tech different from the actual silicon.

Ours actual machines can only execute machine code, thus assembly code, that works on ours CPU. Nothing else. All the rest are concepts over that for humans. All languages and all virtual machines need to be translated in machine code that works on the CPU architecture. All our high-level languages and any virtual architecture like Java and .NET or other, even interpreted BASIC in the 80's, does not exist in the reality. Never for our silicon generation. The IL code does not exist for ours CPU: it's only a virtual ASM.

Therefore .NET "doesn't exist" as well as C or any language other than in our source code. Even bytecode is finally translated in machine code to be executed by the CPU, a thing very basic, simple, mechanical, automatic and not very evolved, even moderns. All source codes and intermediate code instructions in any language we can invent is translated in the machine code understandable by our current gen of Intel, AMD, ARM machines... And even having protected mode, x64 mode, multi-cores, hyper-threading, and so on, CPUs from 1970 and 2020 is basically the same thing, like the difference between the old Volkswagen Beetle and the last Porsche.

It is impossible, without perhaps imagining things that don't exist, to answer such question by thinking that .NET can exist from a different point of view of the machine and that CPU can operate differently than with its current architecture which would be different from processes by code segment and data, and by stack and heap, and all the rest. DotNet can't operates in a different manner on CPU than using the CPU itself operates. It's impossible. Even a quantum simulating platform will finally translate things in a current silicon CPU's machine code.

The CLR use the real machine architecture. So, the CPU and its registers as well as the RAM memory, and all we know about that. The CLR translate IL code to CPU code. That's all: it juggles constantly.

We can see this machine code using the Visual Studio window Debug > Windows > Machine code to inspect the real code executed in realtime on our CPU, and yes, that's x86/x32/i64 ASM... not MSIL we can see for example using ILSpy.

All things other than the machine code does not exists except in our minds and in our working files like sourec code and also bytecode, and that it is "not real at all" from the point of view of the CPU.

To know how works .NET internally you can read books and online resources like:

List of CIL instructions

.NET OpCodes Class

Expert .NET 2.0 IL Assembler.

And also:

Stack register

The Concept of Stack and Its Usage in Microprocessors

Introduction of Stack based CPU Organization

What is the role of stack in a microprocessor?

And also, all is here, way of speaking, among other examples:

Protected mode software architecture

Operating System concepts

To understand better and to improve your skills in computing, you may find interesting to investigate what is assembly language and how work the CPU. You can start with IL and modern Intel but it may be simpler, formative and complementary to start from the past 8086 to i386/i486.

Have a good reading!

Perhaps you can ask your advanced questions about these subjects on https://superuser.com:

Understanding Windows Process Memory Layout

来源：https://stackoverflow.com/questions/65931791/how-does-a-struct-instances-virtual-method-get-located-using-its-type-object-in

标签

memory-management

clr

heap-memory

stack-memory