I have a class like this:
//Array of Structures
class Unit
{
public:
float v;
float u;
//And similarly many other variables of float type, upto
Two things you should be aware that can made a huge difference, depending on your CPU:
Since you are using SSE4, using a specialized memory allocation function that returns an address that aligned at a 16 byte boundary instead of new may give you a boost, since you or the compiler will be able to use aligned load and stores. I have not noticed much difference in newer CPUs, but using unaligned load and stores on older CPUs may be a little bit slower.
As for cache line aliasing, Intel explicit mentions it on its reference manuals (search for "Intel® 64 and IA-32 Architectures Optimization Reference Manual"). Intel says it is something you should be aware, specially when using SoA. So, one thing you can try is to pad your arrays so the lower 6 bits of their addresses are different. The idea is to avoid having them fighting for the same cache line.