Some CPU and compilers supply prefetch instructions. Eg: __builtin_prefetch in GCC Document. Although there is a comment in GCC\'s document, but it\'s too short to me.
It seems, that the best policy to follow is to never use __builtin_prefetch (and its friend, __builtin_expect) at all. On some platforms those may help (and even help a lot) - however, one must always do some benchmarking to confirm this. The real question, whether the short term performance gains will worth the trouble in the longer run.
First, one may ask the following question: what these statements actually do when fed to a higher end modern CPU? The answer is: nobody really knows (except, may be, few guys on the CPU's core architecture team, but they are not going to tell anybody). Modern CPUs are very complex machines, capable of instruction reordering, speculative execution of instructions across possibly not taken branches, etc., etc. Moreover, the details of this complex behavior may (and will) differ considerably between CPU generations and vendors (Intel Core vs Intel I* vs AMD Opteron; with more fragmented platforms like ARM the situation is even worse).
One neat example (not prefetch related, but still) of CPU functionality which used to speed things up on older Intel CPUs, but sucks badly on the more modern one is outlined here: http://lists-archives.com/git/744742-git-gc-speed-it-up-by-18-via-faster-hash-comparisons.html. In that particular case, it was possible to achieve 18% performance increase by replacing the optimized version of gcc supplied memcmp with an explicit ("naive" so to say) loop.