This is the well know select algorithm. see http://en.wikipedia.org/wiki/Selection_algorithm.
I need it to find the median value of a set of 3x3x3 voxel values. Sinc
I'm betting that you could calculate them for zero cost - in a separate thread while loading from disk (or however they're generated).
What I'm really saying is that 'speed' isn't going to come from bit twiddling because 27 values isn't enough for Big O notation to be a real factor.