Just for the record, I ran both codes on my box (x86_64 linux), the C++ with std::array, a plain int[1024] and, for completeness also with long instead of int. Java (open-jdk 1.6) clocked it at 3.8s, C++ (int) at 3.37s, and C++ (long) at 3.9s. My compiler was g++ 4.5.1. Maybe it's just Intel's compiler that's not as good as thought.