Thoughts about rendering loop strategies

六眼飞鱼酱① 提交于 2019-12-06 12:03:23

The unsynchronized approach will work just fine for you; doubly so if this is stock Intel hardware1. I would still not use it.

The reason why unsynchronized concurrency almost never works reliably is that processors have free hand in when to do stores and loads between main RAM and cache. This can subvert almost any unsynchronized protocol. However, as you say, no one is likely to notice if the scenes never abruptly change in your application; all the data will go to the RAM and become visible to the other thread sooner or later.

You however have no guarantee when that will be and in which sequence, which leaves you with a theoretical possibility of mixing two subsequent frames (before and after an abrupt change of the scene or its lighting) in odd ways.

Depending on your programming language and its memory model (C++ older than C++11 I suppose?) you are likely to find lightweight synchronization primitives whose guaranteed side effect are appropriate memory barriers, whose impact on performance will be negligible. This is what I would recommend as a starting point. Extreme performance optimizations (beyond what can be proven safe) should be the last stage of optimizing your engine.


1) i86 never reorders stores. I don't think this is documented anywhere and I would not like relying on it. You can still have reordered reads, so it is no help in your scenario anyway.

I decided to spend a little time testing this, and since I've had so many good answers from this site, I thought I'd post this to complete this question. Maybe someone else will find the information useful.

I made 3 different implementations of a simple sprite rendering application where the Updating and the Rendering runs in separate threads.

1) No Synchronization

The Renderer runs at max 60 FPS. The Updater runs as fast as possible The sprites to update and render exist in a list shared by both threads. No synchronization exist so the threads just read and write the data at will.

2) Synchronization of shared data

The Renderer runs at max 60 FPS. The Updater runs at the same pace as the Renderer The data to update and render exist in a list shared by both threads. The list is fully synchronized. The Updater updates all sprites in the list. Then the Renderer gains access to the list, and renders all sprites to the screen.

3) Synchronization used double rendering queues

The Renderer runs at max 60 FPS. The Updater runs at the same pace as the Renderer The Updater updates the list and sends the sprites to the passive queue of the 2 rendering queue. Meanwhile the Renderer renders the sprites in the active rendering queue. When the Updater has copied the last object to the passive render queue it attempts to swap the active and passive queue. If the renderer is not finished rendering the previous queue, the swap will block. This is the only blocking synchronization. As soon as the Renderer finishes the current frame, the swap is made, the Renderer can start rendering the new queue, and the Updater can start updating and sending to the other (now passive) queue.

I ran 3 tests on each method where I timed the number of times updating and rendering was performed per second.

Test 1:
The number of sprites is sufficiently low so the Renderer can run at full speed (60 FPS)
The update logic of each sprite is too heavy to allow the Updater to keep pace.

Test 2:
The number of sprites is too high for the Renderer to be running at full speed.
The update logic of each sprite is extremely simple, so they can more than keep up.

Test 3:
The number of sprites is exactly high enough to keep the Renderer running a little below max speed.
The update logic of each sprite is exactly heavy enough to keep the Updater running a little below the max speed of the Renderer.

The results

No sync - Test 1:
Renderer runs 60 times per second (max speed).
Updater runs 45 times per second.

No sync - Test 2:
Renderer runs 24 times per second.
Updater runs 1150 times per second.

No sync - Test 3:
Renderer runs 58 times per second.
Updater runs 51 times per second.

Sync shared data - Test 1:
Renderer runs 23 times per second (max speed).
Updater runs 24 times per second.

Sync shared data - Test 2:
Renderer runs 23 times per second.
Updater runs 23 times per second.

Sync shared data - Test 3:
Renderer runs 17 times per second.
Updater runs 17 times per second.

Sync double queue - Test 1:
Renderer runs 43 times per second (max speed).
Updater runs 43 times per second.

Sync double queue - Test 2:
Renderer runs 24 times per second.
Updater runs 24 times per second.

Sync double queue - Test 3:
Renderer runs 54 times per second.
Updater runs 54 times per second.

Conclusion

As you pointed out, Jirka, even if the method of no synchronization seems harmless when there is only one writer it can have unwanted side effects, and it certainly doesn't keep the rendered frame consistent.

It is no great surprise that rendering with dual queues is faster than rendering with one large shared sprite list. What was surprising, however, is that if you consider the fact that there is nothing gained from rendering multiple frames without updating, nor updating multiple times without rendering, then the end result of the dual queue method is actually as fast as the unsynchronized method.

There are probably other things that could be said or tried, but I saw enough already. I will never consider using unsynchronized access for an Update/Render system again..

It is possible to have a separated render and update thread without (much) synchronization. Check out

http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part1/

and

http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part2/

for an explanation and implementation (source + binaries). It isn't easy, but it's exactly what you want.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!