After many years of hearing about Vertex Buffer Objects (VBOs), I finally decided to experiment with them (my stuff isn\'t normally performance critical, obviously...)
14Mpoints/s is not a whole lot. It's suspect. can we see the complete code doing the drawing, as well as the initialisation ? (compare that 14M/s to the 240M/s (!) that Slava Vishnyakov gets). It's even more suspicious that it drops to 640K/s for 1K draws (compared with his 3.8M/s, that looks capped by the ~3800 SwapBuffers, anyways).
I'd be beting the test does not measure what you think it measures.
Assuming I remember this right my OpenGL teacher, who is well known in the OpenGL community, said they are faster on static geometry which is going to be render a lot of time's on a typical game this will be tables chair and small static entities.
There might be a few things missing:
It's a wild guess, but your laptop's card might be missing this kind of operation at all (i.e. emulating it).
Are you copying the data to GPU's memory (via glBufferData
(GL_ARRAY_BUFFER
with either GL_STATIC_DRAW
or GL_DYNAMIC_DRAW
param) or are you using pointer to main (non GPU) array in memory? (that requires copying it every frame and therefore performance is slow)
Are you passing indices as another buffer sent via glBufferData
and GL_ELEMENT_ARRAY_BUFFER
params?
If those three things are done, the performance gain is big. For Python (v/pyOpenGl) it's about 1000 times faster on arrays bigger than a couple 100 elemnts, C++ up to 5 times faster, but on arrays 50k-10m vertices.
Here are my test results for c++ (Core2Duo/8600GTS):
pts vbo glb/e ratio
100 3900 3900 1.00
1k 3800 3200 1.18
10k 3600 2700 1.33
100k 1500 400 3.75
1m 213 49 4.34
10m 24 5 4.80
So even with 10m vertices it was normal framerate while with glB/e it was sluggish.
There are a lot of factors to optimizing 3D rendering. usually there are 4 bottlenecks:
Your test is giving skewed results because you have a lot of CPU (and bus) while maxing out vertex or pixel throughput. VBOs are used to lower CPU (fewer api calls, parallel to CPU DMA transfers). Since you are not CPU bound, they don't give you any gain. This is optimization 101. In a game for example CPU becomes precious as it is needed for other things like AI and physics, not just for issuing tons of api calls. It is easy to see that writing vertex data (3 floats for example) directly to a memory pointer is much faster than calling a function that writes 3 floats to memory - at the very least you save the cycles for the call.
From reading the Red Book, I remember a passage that stated that VBOs are possibly faster depending on the hardware. Some hardware optimizes those, while others don't. It's possible that your hardware doesn't.