If I have a game which has a 3D world, and the world is quite big, so needs to be split into chunks, is there a major, if any, performance advantage of having 128 byte chunk
It may be faster, it may be slower, it may be the same speed. It would be very hard to give the correct answer just by looking at the code. So the answer: Measure it, change the code, measure it again. If your code has to run on different computers, measure it on each.
I'd tend to assume that power-of-two alignment is often asking for severe trouble, and that using more memory than needed isn't going to help with performance. Doing lots of operations with a small part of memory that fits into some cache, then switching to the next part of memory, will often help. Accessing consecutive memory addresses will often help. Rounding up so that you can use vector operations will often help.