A decade or two ago, it was worthwhile to write numerical code to avoid using multiplies and divides and use addition and subtraction instead. A good example is using forwa
In theory the information is here:
Intel®64 and IA-32 Architectures Optimization Reference Manual, APPENDIX C INSTRUCTION LATENCY AND THROUGHPUT
For every processor they list, the latency on FMUL is very close to that of FADD or FDIV. On some of the older processors, FDIV is 2-3 time slower than that, while on newer processors, it's the same as FMUL.
Caveats:
The document I linked actually says you can't rely on these numbers in real life since the processor will do what it wishes to make things faster if it's correct.
There's a good chance your compiler will decide to use one of the many newer instruction sets that have a floating-point multiply / divide available.
This is a complicated document only meant to be read by compiler writers and I might have gotten it wrong. Like I'm not clear why the FDIV latency number is completely missing for some of the CPUs.