ARTICLE AD BOX
Consider this example:
#include <iostream> #include <thread> #include <atomic> extern void block_wait(); extern void wake(); int main(){ std::atomic<int> counter = 0; std::jthread t1([&](){ if(counter.fetch_sub(1,std::memory_order::relaxed) == 0){ // #0 block_wait(); // #1 } }); std::jthread t2([&](){ if(counter.fetch_add(1,std::memory_order::relaxed) == -1){ // #2 wake(); // #3 } }); }In this example, block_wait and wake don't introduce data-race, and their functions are implied by their names. block_wait blocks the thread and waits for a wake signal to unblock the thread. wake wakes the thread that is blocked.
If #0 reads 0, #2 will be guaranteed to read -1 and execute wake() to wake #1. However, if we change #2 to a pure load as follows:
#include <iostream> #include <thread> #include <atomic> extern void block_wait(); extern void wake(); int main(){ std::atomic<int> counter = 0; std::jthread t1([&](){ if(counter.fetch_sub(1,std::memory_order::relaxed) == 0){ // #0 block_wait(); // #1 } }); std::jthread t2([&](){ if(counter.load(std::memory_order::relaxed) == -1){ // #2 wake(); // #3 } }); }Under the same condition: #0 reads 0 and executes #1 to block the thread, in this situation, #2 can also read 0 and doesn't execute #3. That is, the pure load approach cannot guarantee to wake the thread that is blocked. The correctness of the algorithm cannot be guaranteed with a pure load.
What's the correct reason why #2 as an RMW operation doesn't miss to execute wake() compared to a pure load from the perspective of the C++ standard/abstract machine sense? I try to give three explanations; if the explanation is not right, please point out why.
The load part of an RMW operation is less prone to reading the stale value than a pure load.For this explanation, as pointed out in other questions, people think that the concept of stale value is not useful. Anyway, if this concept is not useful, please provide a reasonable explanation of why this argument is incorrect.
The load part of an RMW operation is more prone to reading the later modification in the modification order than a pure loadSimilarly, if this argument is incorrect, please point out why
The C++ standard imposes a stricter restriction on the load part of an RMW operation than a pure loadI suppose this could be an acceptable argument, besides the coherence rule defined in [intro.races] p11-p14, there is an extra rule defined in [atomics.order] p10 to impose on what the value of an RMW can read; instead, a pure load doesn't have this restriction.
