问题

In full emulation the I/O devices, CPU, main memory are virtualized. The guest operating system would access virtual devices not physical devices. But what exactly is full virtualization? Is it the same as full emulation or something totally different?

回答1:

Emulation and virtualization are related but not the same.

Emulation is using software to provide a different execution environment or architecture. For example, you might have an Android emulator run on a Windows box. The Windows box doesn't have the same processor that an Android device does so the emulator actually executes the Android application through software.

Virtualization is more about creating virtual barriers between multiple virtual environments running in the same physical environment. The big difference is that the virtualized environment is the same architecture. A virtualized application may provide virtualized devices that then get translated to physical devices and the virtualization host has control over which virtual machine has access to each device or portion of a device. The actual execution is most often still executed natively though, not through software. Therefore virtualization performance is usually much better than emulation.

There's also a separate concept of a Virtual Machine such as those that run Java, .NET, or Flash code. They can vary from one implementation to the next and may include aspects of either emulation or virtualization or both. For example, the JVM provides a mechanism to execute Java byte codes. However, the JVM spec doesn't dictate that the byte codes must be executed by software or that they must be compiled to native code. Each JVM can do it's own thing and in fact most JVMs do a combination of both using emulation where appropriate and using a JIT where appropriate (the Hotspot JIT I think is what it's called for Sun/Oracle's JVM).

回答2:

In full emulation the I/O devices , CPU , main memory are virtualized.

No, they are emulated in software. Emulated means that their behavior is completely replicated in software.

But what exactly is full virtualization?

With virtualization, you try to run as much code as you can on the on hardware to speed up the process. This is especially a problem with code that had to be run in kernel mode, as that could potentially change the global state of the host (machine the Hypervisor or VMM is running on) and thereby affect other virtual machines.

回答3:

This is an attempt to answer my own question.

System Virtualization : Understanding IO virtualization and role of hypervisor

Virtualization

Virtualization as a concept enables multiple/diverse applications to co-exist on the same underlying hardware without being aware of each other.

As an example, full blown operating systems such as Windows, Linux, Symbian etc along with their applications can coexist on the same platform. All computing resources are virtualized.

What this means is none of the aforesaid machines have access to physical resources. The only entity having access to physical resources is a program known as Virtual Machine Monitor (aka Hypervisor).

Now this is important. Please read and re-read carefully.

The hypervisor provides a virtualized environment to each of the machines above. Since these machines access NOT the physical hardware BUT virtualized hardware, they are known as Virtual Machines.

As an example, the Windows kernel may want to start a physical timer (System Resource). Assume that ther timer is memory mapped IO. The Windows kernel issues a series of Load/Store instructions on the Timer addresses. In a Non-Virtualized environment, these Load/Store would have resulted in programming of the timer hardware.

However in a virtualized environment, these Load/Store based accesses of physical resources will result in a trap/Fault. The trap is handled by the hypervisor. The Hypervisor knows that windows tried to program timer. The hypervisor maintains Timer data structures for each of the virtual machines. In this case, the hypervisor updates the timer data structure which it has created for Windows. It then programs the real timer. Any interrupt generated by the timer is handled by the hypervisor first. Data structures of virtual machines are updated and the latter's interrupt service routines are called.

To cut a long story short, Windows did everything that it would have done in a Non-Virtualized environment. In this case, its actions resulted in NOT the real system resource being updated, but virtual resources (The data structures above) getting updated.

Thus all virtual machines think they are accessing the underlying hardware; In reality unknown to them, all accesses to physical hardware is mediated through by the hypervisor.

Everything described above is full/classic virtualization. Most modern CPUs are unfit for classic virtualization. The trap/fault does not apply to all instructions. So the hypervisor is easily bypassed on modern devices.

Here is where para-virtualization comes into being. The sensitive instructions in the source code of virtual machines are replaced by a call to Hypervisor. The load/store snippet above may be replaced by a call such as

Hypervisor_Service(Timer Start, Windows, 10ms);

EMULATION

Emulation is a topic related to virtualization. Imagine a scenario where a program originally compiled for ARM is made to run on ATMEL CPU. The ATMEL CPU runs an Emulator program which interprets each ARM instruction and emulates necessary actions on ATMEL platform. Thus the Emulator provides a virtualized environment.

In this case, virtualization of system resources is NOT performed via trap and execute model.

回答4:

Without either emulation or virtualization, code runs directly on the hardware. Its instructions are executed natively by the CPU, and its I/O accesses directly access the hardware.

Virtualization is when the guest code runs natively at least some of the time, and only traps to host code running outside the virtual-machine (e.g. a hypervisor) for privileged operations or I/O accesses.

To handle these traps (aka VM exits), the VM may actually emulate what the guest was trying to do. E.g. the guest might be running a device driver for a simple network card, but the NIC is implemented purely in software in the VM. If the VM used a pass-through to send the guest's I/O accesses to a real network card on the host, that would be virtualization of that hardware. (Especially if it did it in a way that let multiple guest use it at once, otherwise it's really just giving it to one guest, not virtualizing it.)

Hardware support for virtualization (like Intel's and AMD's separate x86 virtualization extensions) can let the guest do things that would normally affect the whole machine, like modify the memory mappings in a page table. So instead of triggering a VM exit and making the VM figure out what the guest was doing and then modifying things from the outside to achieve the result, the CPU just has an extra translation layer built in. (See the linked wiki article for a much better but longer description of software-based virtualization vs. hardware-assisted virtualization.)

Pure emulation means that guest code never runs natively, and never sees the "real" hardware of the host. An emulator doesn't need privileged access to the host. (Some might want privileged access to the host for device pass-through, or for raw network sockets to let a guest look like it's really attached to the same network as the host).

An ARM emulator running on an x86 host always has to work this way, because the host hardware can't run ARM instructions in the first place.

But you can still emulate an x86 guest on an x86 host, for example. The fact that the guest and host architectures match doesn't mean the emulator has to take advantage of that fact.

For example, BOCHS is an x86 PC emulator written in portable C++. One of its main uses is for debugging bootloaders and OSes.

BOCHS doesn't care if it's running on an x86 host or not. It's just a C++ program that reads binary files (disk images) and draws in a window (contents of guest video memory). As far as the host is concerned, it's not particularly different from a JPG viewer or a game.

Some emulators use binary translation to JIT-compile the guest code into host code, but this is still emulation, not virtualization. See http://wiki.osdev.org/Emulator_Comparison.

BOCHS is relatively slow, since it reads and decodes guest instructions directly, without doing binary translation. But it tries to do this as efficiently as possible. See How Bochs Works Under the Hood for some of the tricks it uses to efficiently keep track of the guest state. Since emulation is the only option for running x86 software on non-x86 hardware, it's useful to have a high-performance emulator. BOCHS has some very smart and experienced emulator developers working on it, notably Darek Mihocka, who has some interesting articles about optimizing emulation on his site.

回答5:

A full emulator emulates all registers of the target ISA as variables and the CPU is completely emulated. This can be due to wanting to emulate a guest whose ISA is not the same ISA as the host (or indeed it can be the same if you run an x86 emulator e.g. Bochs and you happen to be running it on an x86 system; it doesn't matter. As Peter says, the emulator does not need privileged accesses (ring 0 driver helper), because all interpretation and emulation is done local to the process and the process calls regular host I/O functions. This works because none of the code needs to run natively. If you want it to run natively, you have to bring this functionality to ring 0 via a driver). Full emulation is an emulation of everything: the CPU, the chipset, the BIOS, devices, interrupts, page walk hardware, TLBs. The emulator process runs in ring 3 but this is not visible to the guest which sees emulated/virtual rings (0 and 3) which will be monitored by the interpreter and will emulate interrupts by assigning values to the register variables on violation based on the instruction it is interpreting, mimicking what the CPU would do at each stage but in software. The emulator reads an instruction from an address, analyses it and every time a register e.g. EDX comes up, it will read the EDX variable (emulated EDX). It mimicks the operation of the CPU, which is slow because there are multiple operations for a single operation that is usually handled transparently by the CPU. If the guest attempts to access a virtual address, the dynamic recompiler takes this guest virtual address and traverses the guest page table (mimicking a tlb miss page walker) using the vCR3 and then it reads directly from each physical address produced by vCR3+guest virtual address part using the emulator process page table whose cr3 it has no control over as it is a process and as far as the host OS is concerned the physical address is just a virtual address in the process (guest physical maps to a host virtual by adding an offset and then acting like a host virtual address, so an implicit P2M table). If the dynamic recompiler detects an invalid bit on the guest PTE as it traverses using vCR3 then it simulates a page fault to the guest putting the address in the vCR2.

Full virtualisation, which is a type 1 hypervisor scheme, can actually be used on type 2 hypervisors and is a step up in performance from the former and can only be used if the guest ISA is the same as the host ISA. This is used on VirtualBox and requires the help of a ring 0 driver; the driver functions as the hypervisor. Surprisingly, there is very little information on how type 2 hypervisors are implemented. I have no idea why the glaring issues and quandaries with the implementation that should arise given the axioms of the operation of the host OS and underlying hardware aren't addressed on any article, post, answer, diagram or paper and this will probably be the first answer ever on stack overflow to address them. Nobody seems to know how 64 bit windows is emulated on a 64 bit windows host --- they just don't address the problems. The following will be my best guess on the matter --- how I would implement a type 2 hypervisor given the hardware and host OS operation.

On Windows, I'd imagine when the driver starts it will register an interrupt which the guest software interrupts to in order to start the virtual machine and the handler / DPC will perform the following process: The driver could inject a handler into the IDT for the general protection fault. It can do this by putting a wrapper around KiInterruptDispatch by replacing KiInterruptTemplate in the IDT with the wrapper.

However, a 64 bit windows guest on a 64 bit windows host needs to be able to have its own kernel space but the problem is, it will be at exactly the same location as the host kernel structures. Therefore, the driver needs to wipe the whole kernel view of the virtualbox process. This cannot be mapped in or visible to the guest. It does this by removing the entries from the cr3 page of the virtualbox process. The GDT and IDT that is used by the virtualbox process needs to be that of the host and the hypervisor needs to prevent the guest from changing the IDT value and write protect the virtual address range that maps the IDT in the shadow page table. Any accesses to this range will be logged by the hypervisor in a guest IDT that it builds. The issue with this is that when the ISR is handled, it will jump to a hypervisor RIP that is not mapped into the process because the driver lies in the host kernel. The driver could replace the current handlers in the host IDT to a task gate which will point to a task descriptor it sets up in the host GDT which will hardware task switch a TSS it reserves in the shadow page tables which contains a cr3 of a dummy process that does map the host kernel in order to handle it; the issue with this is that when the handler checks the cr3 it will always be the one it set up in the TSS. Therefore, the solution must be to actually map the RIP of the handler it jumps to into guest space, reserving some virtual memory for each process in the SPT at the RIP in the IDT. The handler will pass the cr3 in a register, change the cr3 to a dummy process that maps the host kernel and then it will call the main handler. The handler checks the cr3 and if it is a guests shadow cr3 or host cr3 and perform the appropriate action.

The driver will also have to inject itself into the clock interrupt in the same way --- if the clock interrupt fires, the guest state or host state (which includes current cr3) is pushed and the hypervisor handler will push the address of the guest IDT clock interrupt onto the kernel stacks of all vCPU threads it manages (emulating what the CPU would do) in a new trap frame if there isn't one already present and then call the original host handler after changing the cr3 to one that maps the host kernel. This would ensure a context switch in the guest every time it is scheduled in on the host and therefore guest clock interval would roughly match up to host clock interval.

Full virtualisation would be referred to as 'trap and emulate', but it is not full emulation because all ring 3 code actually runs on the host CPU (as opposed to full emulation where the code that runs is the interpreter which fetches lines to read). Also, the TLBs and page walk hardware are actually used directly whereas on the emulator, every memory access requires a walk in software if not present in an emulated TLB array in software. Only the privileged instructions and registers, interrupts, devices and BIOS are emulated to the guest --- partial emulation --- emulation still occurs, but when any amount of the code runs natively, it becomes referred to as a virtualisation (full, para or hardware assisted).

When the guest traps into the guest OS it will either use INT 0x2e or syscall. The hypervisor injects an ISR into 0x2e for INT and it will insert a handler at the SYSENTER_CS_MSR:SYSENTER_EIP_MSR for sysenter or IA32_LSTAR MSR for syscall. The INT 0x2e ISR and the handler in the MSR which needs to be mapped into the SPT and reserved will check to see if the cr3 is the shadow of one of the guest processes and if it isn't it doesn't need to change cr3 as the current will contain the host kernel, jumps to the host handler. If it is a cr3 of a guest process shadow, it changes the cr3 and jumps to a main handler, passing RIP in guest IDT that it has built to the recompiler/patcher which walks through the code using guest register state and paravirtualises certain instructions that aren't guaranteed to trap, replacing them with jumps to hypervisor memory (which will cause protection faults as they're ring 0 in the SPT) until it reaches a IRET or sysexit etc and then it changes back the cr3 to that of the guest and executes an IRET after putting a ring 1 privilege on the stack to the RIP in the guest IDT it has built. As for the MSR, the guest will jump to it in ring 0 and the code will immediately IRET to the RIP the guest intended but with a ring 1 privilege pushed on the stack. When a trap due to executing a ring 0 instruction in ring 1, the ISR injected at the general protection fault entry will make sure that the cr3 is of a guest process and it will claim and handle the issue, if it isn't then the cr3 doesn't need to be changed to one that includes host kernel in order to pass control to the host handler because it will be in the context of a non guest process. One instance where this could occur is the guest writing to cr3 for a guest context switch. This needs to be emulated as the guest must not be able to execute this instruction and modify the cr3 because it would change the cr3 of the host process on the host OS; the hypervisor needs to incept the write and write a new shadow cr3 and not the cr3 the guest wants. When the guest reads cr3, this mechanism prevents the guest from reading the real cr3 and the hypervisor inserts the value of the guest inserted cr3 into the correct register, inserts next instruction address onto the stack and resumes execution with an iret to ring 1. Ideally, the hypervisor must hook all interrupts to make sure all INTs are filtered and do not go through to the host; it replaces the KiInterruptTemplate or bugcheck code at all IDT entries with its own decorator.

Guest I/O will be targeted at a physical address space that maps onto virtual buffers and registers of emulated devices defined in the hypervisor. These emulated registers will be checked in a host context at regular intervals (clock interrupt hook for instance) and the handler will decide whether 1) due to I/O completion an interrupt needs to be emulated as mentioned before (pushing an interrupt onto the kernel stack of the thread representing the selected vCPU to interrupt based on MSI vector assigned by guest in the emulated configuration space) or 2) an I/O operation needs to be constructed using the Native windows API functions to the guest specified buffer (translating GVA->HPA).

As for paging on type 2 hypervisor, it is a tricky one. My best guess is that the hypervisor driver creates a shadow cr3 page for every cr3 fault that it sees a new guest assigned cr3 address being written to cr3. It pairs this guest chosen address with the address of the hypervisor chosen shadow cr3 page that the hypervisor allocated on the nonpaged pool and changes the virtualbox process cr3 to that of the shadow cr3 rather than the guest one that was attempted to be written. The shadow cr3 page (you'll see written everywhere that the guest page tables are write protected but it just has to be wrong because it is the shadow page tables that run on the CPU and therefore are the only ones that can cause protection faults; the shadow cr3 is used not the guest cr3) is write protected by the kernel driver (which is done by write protect bit in the recursive PML4 entry to itself). Every time the guest goes to write this to the guest cr3 page by virtual address, this virtual address will always be of the current cr3 which is the shadow cr3 and therefore it will fault. The handler injected at the general protection fault will then see a shadow cr3 of one of its guests processes and it will perform the write that was attempted in the SPT but instead it inserts the host physical address of the page it allocates(any page on free list this time) and notes the mapping in a P2M table it creates for each guest. It will then load the cr3 the guest wanted, write to that, and then load the shadow cr3 again. When a page fault occurs, the handler checks the shadow cr3 is one of its guest processes and check the SPT, load the guest cr3 and check the guest as well. If the shadow PTE is invalid then it is shadow page fault. If the guest PTE is invalid as well then it emulates an interrupt using the RIP of the address in the page fault entry of the guest IDT; before it does this it patches the code in the recompiler as described before. For any other interrupt that occurs i.e. a host device, it is not meant for the guest and therefore if the handler sees the current cr3 belongs to a process of one of its guests it will change the cr3 to a dummy process that contains the host kernel mapping and calls the original `KiInterruptTemplate for the host handler; after the host handler has finished, it will replace the cr3.

Hardware assisted type 2 is a further step up in performance and makes the situation a lot less convoluted and unifies it into a single interface and automates lots of makeshift cr3 juggling and administrative tasks that needed to be improvised, making it a lot cleaner. The kernel driver only needs to execute vmxon, wait for guests to software interrupt to register with the driver and then all VM Exit events will be handled by a unified handler at a RIP and CR3 it inserts into the VMCS host state (meaning the handler stub does not need to be mapped in the guest kernel virtual address space). It is specifically designed for this, unlike ring 1, which means the recompiler (Code Scanning and Analysis Manager (CSAM) and the Patch Manager (PATM)) is not required. It also has things like TSC scaling and TSC offset fields which can be used by guests which employ the TSC for fairer scheduling. The hypervisor still hooks the clock interrupt to perform I/O updates and if the currently executing thread is the address of the thread for one of its vCPUs, it will need to vxmoff (which will cause a VM exit) and push the address of some reinitialisation sequence in host kernel memory that will vmresume the VMCS tied to the vCPU with the guest saved state in it (but with an emulated clock interrupt in place ready to execute, whose code will use RDTSC which will VM exit and the offsets in VMCS can be used by the hypervisor to report a value accounting for time the guest wasn't scheduled in on the host). It doesn't need to change the cr3 because the vmxoff does that automatically so now it can pass it to the host handler to perform the clock interrupt handing procedure for the host OS.

回答6:

A more recent response:

From my research i can say that this is a better response to understand how concept appear:

The first concept of emulation actually dates back to the first computer, the Colossus. It was used by the British government in 1941 to mimic the functions of the Nazi Enigma code machine. Emulation theory was developed in 1962 and was conceived by three IBM engineers working from three different angles.

Emulation means to mimic the behavior of the target which can be hardware, like the emu8086 emulator, or can be software like emulation of a service from some network port.

You want to immitate the set of functions provided by the target and maybe you are not interested in the internal mechanism.

Why would you want that? For controlling that functions. Why control? For multiple reason which is very large subject to be discuss here. But keep in mind that you want to be behind the things.

But such process is costly for performance. You have an instruction for which are executed a lot of other instruction. Maybe you are interested to control only some of that instructions. So we would like to permit some of instructions to be executed native.

So what happens when all of this instructions execution became native? Then you have ideal virtualization. You can virtualize any software, but the trend today is to pass from virtualization of operating systems to that of application. Also i say ideal because this software have a different execution on each hardware so it will be need to also emulate some instructions.Is important to understand that most of virtualize technologies from today are not only about virtualize, but also about emulation.

Also notice that in our transition from emulation to virtualization, the input which of system is reduced, because virtualization accept only software as input. The controller of these flow of instructions is named HyperVisor.

回答7:

Virtualization may happen at different layers of a computer architecture, which are (from higher to lower): 1: Application, 2: Library, 3: Operating System, 4: Hardware Abstraction (HAL), 5: Instruction Set Architecture (ISA). Below the latter layer there is the Hardware. Tipically a certain layer utilizes services from a lower layer by utilizing the instructions the lower layer exposes in its interface.
Note that the usage of service is not strictly related to the layering, in the sense that certain layers can skip the layer immediately below and utilize instruction from lower layers. As an example an Applications may provide certain instructions directly to the HAL layer, skipping the Library and O.S. layers.

To "emulate an instruction" means to intercept and map an instruction intended for a certain layer of a computer architecture (virtual) into a sequence (one or more) instruction(s) for the same layer of a different computer architecture (non-virtual). It is possible to place the virtualization layer at different layers of a Computer Architecture. This point may introduce confusion. As an example, when virtualizing at the level of the Hardware Abstraction Layer (e.g. VMware, VirtualBox), a virtual layer is placed between the HAL layer and the Operating system Layer. The Operating system utilizes instructions of the virtual HAL Layer, then certain virtual ISA (Instruction Set Architecture) are mapped by the hypervisor to ISA for the physical system. When ALL the instruction are emulated, we talk about full emulation, which is a special case of virtualization. In virtualization tipically we try to make a layer to execute directly instruction of the non-virtual layer as much as possible for performance reasons. In another example, the virtualization layer is placed over the Operative System (Virtualization at Operative System Level): in this case a Virtual Machine is named Container (e.g. Docker). It includes the levels from Application to the O.S. (included).

To conclude, emulation is related to single instruction, while "full emulation" happens when we intercept and map ALL the instructions of a certain layer. Tipically, the term "full emulation" is used when the virtualization layer is placed at the ISA level (lower level possible). In this case a Virtual Machine includes all the levels from the Application to the ISA, and ALL the ISA are intercepted and mapped. This is tipically used to virtualize niche products, such as Cisco routers (e.g. with QEMU) or 90's video game consoles, having a completely different architecture from the usual commonly available computers. Note however that there may be a "full emulation" also at other levels, which is tipically not necessary.

回答8:

Virtualization and Emulation are pretty much the same thing. There is one underlying concept that these two words hint at. That is, these two words are aspects of one thing. This is demonstrated in QEMU, a Quick Emulator that performs hardware virtualization.

You can think of that one thing as Simulation. Simulation can also be a confusing word though.

First we can define the common meaning of the words.

Simulation: Making one thing do what another thing does.
Emulation: Making one system replicate another system exactly.
Virtualization: Allow for running of a system within another system.

Now we show that the words all mean pretty much the same thing. For example, in simulation you are creating a replica of one system with another system. That is the common meaning of emulation. In virtualization, you want to have your virtualized system act like the real system. That is, ideally it acts like a replica, even though it may be implemented differently and may not "emulate" the hardware exactly. That is the same as simulation pretty much. In an emulation, you simulate another system, etc..

So we can see that the words are somewhat interchangeable. The underlying concept is simulation.

In virtualization, such as operating system virtualization ("virtual machines"), we are creating a system which acts like the operating system. It might use tricks from the underlying hardware, or hypervisors, or other things, for performance and security. But in the end it is just a simulation of an operating system. Typically when the word "virtual machine" is used, it is not an exact replica of the machine (as in an emulator). It just does enough to allow programs to run as you would expect on the real operating system.

In emulation, it is typically meant that the simulation is "exact". In hardware emulation, you replicate all of the features of the hardware system. This means that you have created a simulation of the hardware. You could say that you created a virtualization of the hardware, but here is where virtualization slightly differs. Virtualization implies creating an isolated environment, which emulation doesn't necessarily imply. So a hardware emulator might provide the same interface to the hardware as the hardware itself, but the implementation of the emulator might rely on global memory, so if you try to run two emulators at the same time, they would interfere with each other. This is what virtualization solves, it isolates the simulations.

Hope that helps.

来源：https://stackoverflow.com/questions/6044978/full-emulation-vs-full-virtualization

标签

virtualization

emulation

Full emulation vs. full virtualization

问题