is it possible to a vCPU to use different CPUs from two different hardware computers

问题

I'v searched about this but i don't seem to get fair answer. lets say i wan't to create a vm that has a vCPU, and that vCPU must have 10 cores but i only have 2 computers with 5 cores of physical CPU for each.

is it possible to create one vCPU by relaying on these two physical CPUs to perform like regular one physical CPU?

Update 1: lets say i'm using virtualBox, and the term vCPU is referring to virtual cpu, and it's a well known term.

Update 2: i'm asking this because i'm doing a little research about dynamic provisioning in HPC clusters, and i wan't to know if the word "dynamic" really means allocating virtual cpus dynamically from different hardwares, like bare-metal servers. i don't know if i was searching in the wrong place but no one really answers this question in the docs.

回答1:

I'll use term vCPU as virtual cores and pCPU as physical cores, as it is defined by virtualbox documentation: https://www.virtualbox.org/manual/ch03.html#settings-processor

On the "Processor" tab, you can set how many virtual CPU cores the guest operating systems should see. Starting with version 3.0, VirtualBox supports symmetrical multiprocessing (SMP) and can present up to 32 virtual CPU cores to each virtual machine. You should not, however, configure virtual machines to use more CPU cores than you have available physically (real cores, no hyperthreads).

And I will try to answer your questions:

lets say i wan't to create a vm that has a vCPU, and that vCPU must have 10 cores but i only have 2 computers with 5 cores of physical CPU for each.

If you want to create virtual machine (with single OS image, SMP machine) all virtual cores should have shared memory. Two physical machines each of 5 cores have in sum 10 cores, but they have no shared memory. So, with classic virtualization software (qemu, kvm, xen, vmware, virtualbox, virtualpc) you is not able to convert two physical machine into single virtual machine.

is it possible to create that vCPU by relaying on these two physical CPUs to perform like regular one physical CPU?

No.

Regular physical machine have one or more CPU chips (sockets) and each chip has one or more cores. First PC had 1 chip with one core; there were servers with two sockets with one core in each. Later multicore chips were made, and huge servers may have 2, 4, 6 or sometimes even 8 sockets, with some number of cores per socket. Also, physical machine has RAM - dynamic computer memory, which is used to store data. Earlier multisocket systems had single memory controller, current multisocket systems have several memory controllers (MC, 1-2 per socket, every controller with 1, 2, or sometimes 3 or 4 channels of memory). Both multicore and multisocket systems allow any CPU core to access any memory, even if it is controlled by MC of other socket. And all accesses to the system memory are coherent (Memorycoherence, Cachecoherence) - any core may write to memory and any other core will see writes from first core in some defined order (according to Consistency model of the system). This is the shared memory.

"two physical" chips of two different machines (your PC and your laptop) have not connected their RAM together and don't implement in hardware any model of memory sharing and coherency. Two different computers interacts using networks (Ethernet, Wifi, .. which just sends packets) or files (store file on USB drive, disconnect from PC, connect to laptop, get the file). Both network and file sharing are not coherent and are not shared memory

i'm using virtualBox

With VirtualBox (and some other virtualization solutions) you may allocate 8 virtual cores for the virtual machine even when your physical machine has 4 cores. But VMM will just emulate that there are 8 cores, scheduling them one after one on available physical cores; so at any time only programs from 4 virtual cores will run on physical cores (https://forums.virtualbox.org/viewtopic.php?f=1&t=30404 " core i7, this is a 4 core .. I can use up to 16 VCPU on virtual Machine .. Yes, it means your host cores will be over-committed. .. The total load of all guest VCPUs will be split among the real CPUs."). In this case you will be able to start 10 core virtual machine on 5 core physical, and application which want to use 10 cores will get them. But performance of the application will be not better as with 5 real CPUs, and it will be less, because there will be "virtual CPU switching" and frequent synchronization will add extra overhead.

Update 2: i'm asking this because i'm doing a little research about dynamic provisioning

If you want to research about "dynamic provisioning", ask about it, not about "running something unknown on two PC at the same time)

in HPC clusters,

There are no single type of "HPC" or "HPC clusters". Different variants of HPC will require different solutions and implementations. Some HPC tasks needs huge amounts of memory (0.25, 0.5, 1, 2 TB) and will run only on shared-memory 4- or 8-socket machines, filled with hugest memory DIMM modules. Other HPC tasks may use GPGPU a lot. Third kind will combine thread parallelism (OpenMP) and process parallelism (MPI), so applications will use shared memory while threads of it runs on single machine, and they will send and receive packets over network to work collectively on one task while running on several (thousands) physical machines. Fourth kind of HPC may want to have 100 or 1000 TB of shared memory; but there are no SMP / NUMA machines with such amounts, so application can be written in Distributed shared memory paradigm/model (Distributed global address space DGAS, Partitioned global address space PGAS) to run on special machines or on huge clusters. Special solutions are used, and in PGAS the global shared memory of 100s TB is emulated from many computers which are connected with network. Program is written in special language or just use special library functions to access memory (list of special variants from Wikipedia: PGAS "Unified Parallel C, Coarray Fortran, Split-C, Fortress, Chapel, X10, UPC++, Global Arrays, DASH and SHMEM"). If the address or the request is in local memory, use it; if it is in memory of other machine, send packet to that machine to request data from memory. Even with fastest (100 Gbit/s) special networks with RDMA capability (network adapter may access memory of the PC without any additional software processing of incoming network packet) the difference between local memory and memory of remote computer is speed: you have higher latency of access and you have lower bandwidth when memory is remote (remote memory is slower than local memory).

If you say "vCPU must have 10 cores" we can read this as "there is application which want 10 core of shared memory system". In theory it is possible to emulate shared memory for application (and it can be possible to create virtualization solution which will use resources from several PC to create single virtual pc with more resources), but in practice this is very complex task and the result probably will has too low performance. There is commercial ScaleMP (very high cost; Wikipedia: ScaleMP "The ScaleMP hypervisor combines x86 servers to create a virtual symmetric multiprocessing system. The process is a type of hardware virtualization called virtualization for aggregation.") and there was commercial Cluster OpenMP from Intel (https://software.intel.com/sites/default/files/1b/1f/6330, https://www.hpcwire.com/2006/05/19/openmp_on_clusters-1/) to convert OpenMP programs (uses threads and shared memory) into MPI-like software with help of library and OS-based handlers of access to remote memory. Both solutions can be ranged from "make target application slower" to "make target application very-very slow" (internet search of scalemp+slow and cluster+openmp+slow), as computer network is always slower that computer memory (network has greater distance than memory - 100m vs 0.2m, network has narrow bus of 2, 4 or 8 high-speed pairs while memory has 64-72 high-speed pairs for every memory channel; network adapter will use external bus of CPU when memory is on internal interface, most data from network must be copied to the memory to become available to CPU).

and i wan't to know if the word "dynamic" really means no one really answers this question in the docs.

If you want help from other people, show us the context or the docs you have with the task. It can be also useful to you to better understand some basic concepts from computing and from cluster computing (Did you have any CS/HPC courses?).

There are some results from internet search request like "dynamic+provisioning+in+HPC+clusters", but we can't say is it the same HPC variant as you want or not.

回答2:

Unfortunately, I have to start by saying that I completely disagree with the answer from OSGX (and I have to start with that as the rest of my answer depends on it). There are documented cases where aggregating CPU power of multiple physical systems into a single system image work great. Even about the comment regarding ScaleMP ...solutions can be ranged from "make target application slower" to "make target application very-very slow" ... - all one needs to do to invalidate that claim is to check the top-rated machines in the SPEC CPU benchmark lists to see machines using ScaleMP are in the top 5 SMPs ever built for performance on this benchmark. Also, from computer architecture perspective, all large scale machines are essentially a collection of smaller machines with a special fabric (Xbar, Numalink, etc.) and some logic/chipset to manage cache coherence. today's standard fabrics (PCIe Switching, InfiniBand) are just as fast, if not faster, than those proprietary SMP interconnects. Will OSGX claim those SMPs are also "very-very-slow"?

The real question, as with any technology, is what are you trying to achieve. Most technologies are a good fit for one task but not the other. If you are trying to build a large machine (say, combine 16 servers, each with 24 cores, into a 384-core SMP), on-top of which you will be running small VMs, each using single digit number of vCPUs, then this kind of SSI solution would probably work very nicely as to the underlying infrastructure you are merely running a high-throughput computing (HTC) job - just like SPEC CPU is. However, if you are running a thread-parallel software that excessively uses serializing elements (barriers, locks, etc) that require intensive communication between all cores - then maybe you won't see any benefit.

As to the original question on the thread, or rather, the "Update 2" by the author: ...I'm asking this because i'm doing a little research about dynamic provisioning in HPC clusters... Indeed, there is not a lot of technology out there that enables the creation of a single system from CPUs across a cluster. The technology mentioned earlier, from ScaleMP, does this but only at a physical server granularity (so, if you have a cluster of 100 servers and each cluster node has 24 cores, then you can "dynamically" create virtual machines of 48 cores (2 cluster nodes), 72 cores (3 cluster nodes), and so on, but you could not create a machine with 36 cores (1.5 cluster nodes), nor combine a few vacant CPUs from across different nodes - you either use all the cores from a node to combine into a virtual SMP, or none at all.

来源：https://stackoverflow.com/questions/42495315/is-it-possible-to-a-vcpu-to-use-different-cpus-from-two-different-hardware-compu

标签

cluster-computing

cpu

virtualization

hpc