What hardware conditions can cause an atomic fetch_add (RMW) to significantly stall control flow?

16 hours ago 1

ARTICLE AD BOX

In C++ (std::atomic::fetch_add), we often treat atomic Read-Modify-Write (RMW) operations as fast, "lock-free" primitives. However, I am interested in understanding the extreme latency bounds of these operations at the hardware level.

Beyond the obvious case of high thread contention (cache line bouncing) I supposed, I want to know: Is it possible to observe a single fetch_add operation stalling the CPU's instruction pipeline for a long time? If so, what hardware or system-level conditions can trigger this, even if the atomic variable is not being heavily contested?

For example:

#include <thread> #include <atomic> int main(){ std::atomic<int> v = 0; auto t1 = std::thread([&](){ v.fetch_add(1,std::memory_order::relaxed); // #1 }); auto t2 = std::thread([&](){ v.fetch_add(1,std::memory_order::relaxed); // #2 }); t1.join(); t2.join(); }

Can we observe either the complete invocation of #1 or #2 spending a long time?

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

What hardware conditions can cause an atomic fetch_add (RMW) to significantly stall control flow?

ARTICLE AD BOX

Related

Templated class with constructor that takes T* shadow error [duplicate]

ControlService function returns error 1061 despite dwControlsAccepted field showing SERVICE_ACCEPT_STOP

How can I update a canvas drawing with data fed in continuously?

LEFT SIDEBAR AD