They all have the same asymptotic behavior, so the performance depends more on the implementation than which type of tree you are using.
Some combination of tree structures might actually be the fastest approach, where each node of a B-tree fits exactly into a cache-line and some sort of binary tree is used to search within each node. Managing the memory for the nodes yourself might also enable you to achieve even greater cache locality, but at a very high price.
Personally, I just use whatever is in the standard library for the language I am using, since it's a lot of work for a very small performance gain (if any).
On a theoretical note... RB-trees are actually very similar to B-trees, since they simulate the behavior of 2-3-4 trees. AA-trees are a similar structure, which simulates 2-3 trees instead.