I know that writing to a volatile variable flushes it from the memory of all the cpus, however I want to know if reads to a volatile variable are as fast as nor
It is architecture dependent. What volatile does is tell the compiler not to optimise that variable away. It forces most operations to treat the variable's state as an unknown. Because it is volatile, it could be changed by another thread or some other hardware operation. So, reads will need to re-read the variable and operations will be of the read-modify-write kind.
This kind of variable is used for device drivers and also for synchronisation with in-memory mutexes/semaphores.