memory-barriers

Is there an implicit memory barrier with synchronized-with relationship on thread::join?

♀尐吖头ヾ 提交于 2019-12-02 00:53:11
I have a code at work that starts multiple threads that doing some operations and if any of them fail they set the shared variable to false . Then main thread joins all the worker threads. Simulation of this looks roughly like this (I commented out the possible fix which I don't know if it's needed): #include <thread> #include <atomic> #include <vector> #include <iostream> #include <cassert> using namespace std; //atomic_bool success = true; bool success = true; int main() { vector<thread> v; for (int i = 0; i < 10; ++i) { v.emplace_back([=] { if (i == 5 || i == 6) { //success.store(false,

Is there an implicit memory barrier with synchronized-with relationship on thread::join?

纵饮孤独 提交于 2019-12-01 21:56:15
问题 I have a code at work that starts multiple threads that doing some operations and if any of them fail they set the shared variable to false . Then main thread joins all the worker threads. Simulation of this looks roughly like this (I commented out the possible fix which I don't know if it's needed): #include <thread> #include <atomic> #include <vector> #include <iostream> #include <cassert> using namespace std; //atomic_bool success = true; bool success = true; int main() { vector<thread> v;

Can DMB instructions be safely omitted in ARM Cortex M4

这一生的挚爱 提交于 2019-12-01 12:38:24
I am going through the assembly generated by GCC for an ARM Cortex M4, and noticed that atomic_compare_exchange_weak gets two DMB instructions inserted around the condition (compiled with GCC 4.9 using -std=gnu11 -O2 ): // if (atomic_compare_exchange_weak(&address, &x, y)) dmb sy ldrex r0, [r3] cmp r0, r2 itt eq strexeq lr, r1, [r3] cmpeq.w lr, #0 dmb sy bne.n ... Since the programming guide to barrier instructions for ARM Cortex M4 states that: Omitting the DMB or DSB instruction in the examples in Figure 41 and Figure 42 would not cause any error because the Cortex-M processors: do not re

Can DMB instructions be safely omitted in ARM Cortex M4

感情迁移 提交于 2019-12-01 11:39:14
问题 I am going through the assembly generated by GCC for an ARM Cortex M4, and noticed that atomic_compare_exchange_weak gets two DMB instructions inserted around the condition (compiled with GCC 4.9 using -std=gnu11 -O2 ): // if (atomic_compare_exchange_weak(&address, &x, y)) dmb sy ldrex r0, [r3] cmp r0, r2 itt eq strexeq lr, r1, [r3] cmpeq.w lr, #0 dmb sy bne.n ... Since the programming guide to barrier instructions for ARM Cortex M4 states that: Omitting the DMB or DSB instruction in the

asio implicit strand and data synchronization

半城伤御伤魂 提交于 2019-12-01 11:05:57
When I read asio source code, I am curious about how asio making data synchronized between threads even a implicit strand was made. These are code in asio: io_service::run mutex::scoped_lock lock(mutex_); std::size_t n = 0; for (; do_run_one(lock, this_thread, ec); lock.lock()) if (n != (std::numeric_limits<std::size_t>::max)()) ++n; return n; io_service::do_run_one while (!stopped_) { if (!op_queue_.empty()) { // Prepare to execute first handler from queue. operation* o = op_queue_.front(); op_queue_.pop(); bool more_handlers = (!op_queue_.empty()); if (o == &task_operation_) { task

asio implicit strand and data synchronization

六月ゝ 毕业季﹏ 提交于 2019-12-01 09:03:03
问题 When I read asio source code, I am curious about how asio making data synchronized between threads even a implicit strand was made. These are code in asio: io_service::run mutex::scoped_lock lock(mutex_); std::size_t n = 0; for (; do_run_one(lock, this_thread, ec); lock.lock()) if (n != (std::numeric_limits<std::size_t>::max)()) ++n; return n; io_service::do_run_one while (!stopped_) { if (!op_queue_.empty()) { // Prepare to execute first handler from queue. operation* o = op_queue_.front();

Thread safe usage of lock helpers (concerning memory barriers)

允我心安 提交于 2019-12-01 05:39:57
By lock helpers I am referring to disposable objects with which locking can be implemented via using statements. For example, consider a typical usage of the SyncLock class from Jon Skeet's MiscUtil : public class Example { private readonly SyncLock _padlock; public Example() { _padlock = new SyncLock(); } public void ConcurrentMethod() { using (_padlock.Lock()) { // Now own the padlock - do concurrent stuff } } } Now, consider the following usage: var example = new Example(); new Thread(example.ConcurrentMethod).Start(); My question is this - since example is created on one thread and

How is load->store reordering possible with in-order commit?

孤街醉人 提交于 2019-12-01 05:19:22
ARM allows the reordering loads with subsequent stores, so that the following pseudocode: // CPU 0 | // CPU 1 temp0 = x; | temp1 = y; y = 1; | x = 1; can result in temp0 == temp1 == 1 (and, this is observable in practice as well). I'm having trouble understanding how this occurs; it seems like in-order commit would prevent it (which, it was my understanding, is present in pretty much all OOO processors). My reasoning goes "the load must have its value before it commits, it commits before the store, and the store's value can't become visible to other processors until it commits." I'm guessing

Does `xchg` encompass `mfence` assuming no non-temporal instructions?

喜欢而已 提交于 2019-12-01 04:28:58
I have already seen this answer and this answer , but neither appears to clear and explicit about the equivalence or non-equivalence of mfence and xchg under the assumption of no non-temporal instructions. The Intel instruction reference for xchg mentions that this instruction is useful for implementing semaphores or similar data structures for process synchronization , and further references Chapter 8 of Volume 3A . That reference states the following. For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This

How is load->store reordering possible with in-order commit?

允我心安 提交于 2019-12-01 02:17:06
问题 ARM allows the reordering loads with subsequent stores, so that the following pseudocode: // CPU 0 | // CPU 1 temp0 = x; | temp1 = y; y = 1; | x = 1; can result in temp0 == temp1 == 1 (and, this is observable in practice as well). I'm having trouble understanding how this occurs; it seems like in-order commit would prevent it (which, it was my understanding, is present in pretty much all OOO processors). My reasoning goes "the load must have its value before it commits, it commits before the