问题
I used nvprof and nvidia-smi to monitor the GPU power dissipation respectively, but observed different results, summarized in the table below.
----------------------------------------------------------------
gpu | busy | idle
model | nvprof[Watt] smi[Watt] | nvprof[Watt] smi[Watt]
----------------------------------------------------------------
M2090 | ~151 ~151 | ~100 ~75
K20 | ~105 ~102 | ~63 ~43
----------------------------------------------------------------
note 0: "busy" means my code is running on the monitored GPU
note 1: nvprof reports the power for all the devices. So my way to get the "idle" power using nvprof for a specific GPU is simply to have the code running on another GPU.
note 2: nvidia-smi reports a couple of different quantities about power, but I was focusing on "power draw"
note 3: cuda version: 5.5
So my question is: why is the power reported by nvidia-smi generally smaller than nvprof, and why does this discrepancy become larger when the idle power is monitored? and ultimately, which utility should I trust more?
Also, just to make sure, does the power that the two utilies measure refer to the input electric power (P=I*U) rather than the output thermal power, right?
Thanks a lot for any advice!
Update @njuffa and @talonmies 's speculation makes very good sense. So I explored smi a little bit more for power analysis. The results, however, do not make sense to me.
additional notes:
The discontinuity of the red data is because I directly used the timestamp reported by smi, which has low resolution (sec). Besides, for illustration purpose p0 is assigned an numerical value of 20 and p1 of 10. So for most of the time, the GPU is put into its full performance state (this is odd), except for the "busy" case, where the GPU somehow drops to p1 during 15~18s (odd).
It is not until ~21.3s that cudaSetDevice() is invoked for the very first time. So the power rise and p-state change that occurs at ~18s is rather odd.
"busy power" is measured when my GPU code is set to the background, and smi put into an infinite loop to query the power and p-state repeatedly until the background process terminates. "idle power" is measured simply by launching smi 50 times. Apparently in the latter case, smi exhibits larger overhead, which is again, odd.
回答1:
Ignore the p-states. They are confusing you.
nvprof (alone) uses substantially more of the GPU than does nvidia-smi (alone). So the "idle" power consumed when running nvprof is higher than it is when just doing nvidia-smi. nvprof fires up a number of engines on the GPU, whereas nvidia-smi simply fires up some registers and maybe some I2C circuitry.
The GPU has a number of p-states, and a true idle p-state is P8 or below (i.e. larger).
Just running nvidia-smi can (frequently will) raise the p-state of the GPU, briefly, from a "true idle" P-state to a higher one, like P0. This does not tell you: - how long the p-state elevation is occurring (the sampling period of nvidia-smi is too coarse) - how much power is actually getting consumed. Yes, p-state is an indicator, but does not tell you anything in a calibrated way. A GPU can be more or less "idle" while at P0 (for instance, put your GPUs in persistence mode).
The discrepancy between the two measurements has already been explained. The graph and additional update is not serving any useful purpose, it's just confusing you.
If you want to measure power, use either approach. It's clear that they are quite correlated for the GPU "busy" case, and the fact that they appear to be different in the "idle" case simply means you're making assumptions about "idle" in both cases which simply aren't true.
来源:https://stackoverflow.com/questions/20040426/why-do-nvprof-and-nvidia-smi-report-different-results-on-power