What's the reason why the waken signal cannot be lost by using an RMW operation compared to a pure load?

1 day ago 1
ARTICLE AD BOX

Consider this example:

#include <iostream> #include <thread> #include <atomic> extern void block_wait(); extern void wake(); int main(){ std::atomic<int> counter = 0; std::jthread t1([&](){ if(counter.fetch_sub(1,std::memory_order::relaxed) == 0){ // #0 block_wait(); // #1 } }); std::jthread t2([&](){ if(counter.fetch_add(1,std::memory_order::relaxed) == -1){ // #2 wake(); // #3 } }); }

In this example, block_wait and wake don't introduce data-race, and their functions are implied by their names. block_wait blocks the thread and waits for a wake signal to unblock the thread. wake wakes the thread that is blocked.

If #0 reads 0, #2 will be guaranteed to read -1 and execute wake() to wake #1. However, if we change #2 to a pure load as follows:

#include <iostream> #include <thread> #include <atomic> extern void block_wait(); extern void wake(); int main(){ std::atomic<int> counter = 0; std::jthread t1([&](){ if(counter.fetch_sub(1,std::memory_order::relaxed) == 0){ // #0 block_wait(); // #1 } }); std::jthread t2([&](){ if(counter.load(std::memory_order::relaxed) == -1){ // #2 wake(); // #3 } }); }

Under the same condition: #0 reads 0 and executes #1 to block the thread, in this situation, #2 can also read 0 and doesn't execute #3. That is, the pure load approach cannot guarantee to wake the thread that is blocked. The correctness of the algorithm cannot be guaranteed with a pure load.

What's the correct reason why #2 as an RMW operation doesn't miss to execute wake() compared to a pure load from the perspective of the C++ standard/abstract machine sense? I try to give three explanations; if the explanation is not right, please point out why.

The load part of an RMW operation is less prone to reading the stale value than a pure load.

For this explanation, as pointed out in other questions, people think that the concept of stale value is not useful. Anyway, if this concept is not useful, please provide a reasonable explanation of why this argument is incorrect.

The load part of an RMW operation is more prone to reading the later modification in the modification order than a pure load

Similarly, if this argument is incorrect, please point out why

The C++ standard imposes a stricter restriction on the load part of an RMW operation than a pure load

I suppose this could be an acceptable argument, besides the coherence rule defined in [intro.races] p11-p14, there is an extra rule defined in [atomics.order] p10 to impose on what the value of an RMW can read; instead, a pure load doesn't have this restriction.

Read Entire Article