If I have a game which has a 3D world, and the world is quite big, so needs to be split into chunks, is there a major, if any, performance advantage of having 128 byte chunk
The other answers are indeed correct that power-of-two sized data will benefit from using shifts over multiplies.
However, there is a dark side to power-of-two size data. And it can hit you when you least expect it.
See these two question/answers:
When your datasets are powers-of-two, they are more likely to be super-aligned in memory. (meaning their addresses will likely have the same modulo over a large power-of-two.)
While this may seem desirable, they can lead to:
If you read the two questions linked to above, you can see that alignment can cause a slow-down of more than 3x - which will likely far out-weigh any benefit you get from using shifts as opposed to multiplies.
So as with all performance questions, you need to measure, measure, measure... And be prepared to expect anything to happen.
You mention that you are representing a 3D-space - that is exactly the kind of situation that would exhibit power-of-two strided memory access that could lead to slow-downs.
Powers of two are used a lot in software because it's the number-base that computers use.
For example, OS's will allocate memory in block sizes of powers of two, the cache sizes in the processor are powers of two, address sizes are powers of two and so on.
Operations using powers of two values can also be optimised - a multiply or divide becomes a simple bit shift.
Basically ensuring everything uses powers of two might improve the performance of your software, but normally a compiler and/or OS will ensure that your data is utilised in an effective way when you use arbitrary sizes.
It's not exactly "faster", it rather utilises the available memory better since the hardware and the operating system manage memory in units having a size that is most likely a power of two. Allocating something that is less than a power of two will usually result in wasting memory because of alignment requirements.
If you dig deeper into allocators and OS memory managers, you will see that they manage everything in power-of-two sizes. An OS usually manages the memory of a process in terms of pages, and a page size is usually 4096 bytes nowadays. So if you want to allocate a piece that is 4000 bytes, the OS will still allocate 4096 bytes and the remaining 96 bytes will be wasted.
If you access to the data by the following way:
chunks[150][150][150]
chucks[x][y][z] = 123;
Then processor must do multiplications (something like: z + 150 * (y + 150 * x) ... ) for getting an address.
If you use power-of-2 constants, then compiler could make some optimization, and use shiftings instead of multiplications. New CPU makes multiplications quite fast, so the effect is insignificant.
Using of big table can cause lot of cache-misses. So smaller table is probably faster than bigger, even the bigger have power-of-2 sized dimensions, and smaller not.
It may be faster, it may be slower, it may be the same speed. It would be very hard to give the correct answer just by looking at the code. So the answer: Measure it, change the code, measure it again. If your code has to run on different computers, measure it on each.
I'd tend to assume that power-of-two alignment is often asking for severe trouble, and that using more memory than needed isn't going to help with performance. Doing lots of operations with a small part of memory that fits into some cache, then switching to the next part of memory, will often help. Accessing consecutive memory addresses will often help. Rounding up so that you can use vector operations will often help.