I did a comparison between blitz++, armadillo, boost::MultiArray with the following code (borrowed from an old post)
#include
using namespac
As far as I can tell, you are judging the performance of each matrix library by measuring the speed of multiplying a single matrix by a scalar. Due to its template-based policy, Armadillo will do a very good job at this by breaking down each multiply into parallelizable code for most compilers.
But I suggest you need to rethink your test scope and methodology. For example, you've left out every BLAS implementation. The BLAS function you'd need would be dscal. A vendor-provided implementation for your specific CPU would probably do a good job.
More relevantly, there are many more things any reasonable vector library would need to be able to do: matrix multiplies, dot products, vector lengths, transposes, and so forth, which aren't addressed by your test. Your test addresses exactly two things: element assignment, which practically speaking is never a bottleneck for vector libraries, and scalar/vector multiplication, which is a BLAS level 1 function provided by every CPU manufacturer.
There is a discussion of BLAS level 1 vs. compiler-emitted code here.
tl:dr; use Armadillo with BLAS and LAPACK native libraries linked in for your platform.