Somewhat related question, and a year old: Do any JVM's JIT compilers generate code that uses vectorized floating point instructions?
Preface: I am trying to do this
Looks like a lot of SIMD/SSE optimizations were made in Java 8/9.