How taxing are OpenGL glDrawElements() calls compared to basic logic code?

前端 未结 2 1839
遥遥无期
遥遥无期 2020-12-18 17:35

I\'m planning to do some optimization on my OpenGL program (it doesn\'t need optimizing, but I\'m doing it for the sake of it). Out of curiosity, how expensive are OpenGL dr

相关标签:
2条回答
  • 2020-12-18 17:54

    The actual numbers are highly platform and vendor dependent. Driver architectures on different operating systems vary substantially, and some of them are more efficient than others. On top of that, driver implementations and hardware can cause large performance differences. For example, I've seen 10-20 times higher draw call throughput for one vendor compared to another vendor, on the same platform and with comparable hardware.

    Based on this, any numbers below are just a very rough order of magnitude. You really need to measure this yourself on the configurations you care about.

    With all these disclaimers, I would expect that a draw call could be processed in the range of 100 instructions (CPU cycles). This is for the case where you just make back to back draw calls, and there are no other bottlenecks in the pipeline.

    As @NicolBolas already pointed out, the most expensive part of handling draw calls is normally processing deferred state changes. And most of the time, you will have state changes between draw calls. In this case, for relatively cheap state changes (like binding a texture or buffer, or changing some attributes), a few 100 instructions are typical.

    Switching frame buffers is generally quite expensive, and very expensive on some platforms. Other than that, the numbers I measured in the past while optimizing and benchmarking state changes showed an order that is quite different from the list in @NicolBolas' answer. But again, this is highly platform and vendor/hardware dependent.

    There are a couple more aspects that makes this somewhat tricky to measure:

    • Most of the CPU time might not be consumed in your thread. Many drivers are multi-threaded, meaning that most of the work needed to process OpenGL calls is offloaded to a secondary thread. If your application does not use all CPU cores, and you're not throttled by power/thermal limits, this means that a lot of the driver work can happen in parallel, without slowing down your application much. But particularly on mobile devices and laptops, performance is often limited by power consumption, so the driver overhead will still slow you down.
    • CPU time consumed by the driver is only part of what can slow your application code down. Another consideration is cache pollution. If cache content used by your application is evicted while the OpenGL implementation processes your draw calls, your own code will get more cache misses, and will run slower. So measuring the time spent inside the OpenGL calls only shows part of the picture.
    0 讨论(0)
  • 2020-12-18 18:00

    The cost of glDrawElements (or any other OpenGL rendering command) cannot really be estimated. This is because its cost depends a great deal on what OpenGL state you changed between draw calls. The cost of calling an OpenGL state changing function (basically, any OpenGL function that isn't a glGet of some form or a glDraw of some form) will be relatively quick. But it will make the next draw call slower.

    This video on OpenGL performance shows which state changes are more costly at draw time than others. The really good part starts around 31 minutes in.

    Draw calls are relatively fast if you haven't changed any OpenGL state between draw calls. Different pieces of state have different effects on draw calls. From fastest to slowest (according to NVIDIA's presentation above, so take it with a grain of salt):

    • Non-UBO uniform updates
    • Vertex buffer bindings (without changing formats)
    • UBO binding
    • Vertex format changes
    • Texture bindings
    • Fragment post-processing state changes
    • Shader program changes
    • Render target switches

    Now, a draw call will be more expensive than "basic logic". They're not cheap, even without state changes between them. If efficiency is important to your code, then grouping your squares is a good idea.

    0 讨论(0)
提交回复
热议问题