I am going through the assembly generated by GCC for an ARM Cortex M4, and noticed that atomic_compare_exchange_weak gets two DMB instructions inse
I'm not aware of whether Cortex M4 can be used in a multi-cpu/multi-core configuration, but in general:
Presence or lack of reordering memory writes at the hardware level is irrelevant.
Of course I would expect the DMB instruction to be essentially free on chips that don't support SMP, so I'm not sure why you'd want to try to hack it out.
Please note that, based on the question's referencing the code the compiler produces for atomic intrinsics, I'm assuming the context is for synchronization of atomics to make them match the high-level specification, not other uses like IO barriers for MMIO, and the above "never" should not be read as applying to this (unrelated) use (though I suspect, for the reasons you already cited, it doesn't apply to Cortex M4).