ARTICLE AD BOX
I want to set up a watchdog that checks whether the io_context workers can pick up tasks within a reasonable time and are not stuck running long or blocking operations.
To achieve this, I've implemented a check that verifies the io_context queue is functioning properly by scheduling a task every 30 seconds. This task simply sets a flag to true so we can confirm that the queue is still responsive.
using PeriodicTask = BasicScheduledTask<boost::asio::steady_timer, true>; std::shared_ptr<PeriodicTask> io_context_alive_task_ = std::make_shared<PeriodicTask> (io_context_, [this](const auto& ec) { if (ec) { print_error("Could not report io_context as alive: {}", ec.message()); return; } print_debug("Marking io_context as alive"); is_context_alive_ = true; }, 30s));The watchdog runs in its own independent thread outside io_context and checks every 2 minutes whether the flag has been set to true.
std::unique_ptr<std::thread> context_watchdog_ = std::make_unique<std::thread>([this] { while (!io_context_.stopped()) { io_context_alive_cv_.wait_for(lock, 2min, [this] { return io_context_.stopped(); }); if (!is_context_alive_) { print_critical("io_context is not responding"); std::abort(); } print_debug("io_context is ok, setting back to false"); is_context_alive_ = false; } print_debug("io_context stopped. stopping thread"); });I've notice that I get some false alarms when the system wakes from sleep.
This happens since the 30 second periodic task that marks the context as alive does not run for more than 2 minutes. As a result, the watchdog assumes the io_context is unresponsive and attempts to abort the service.
I wonder if the io_context_alive_cv_ which is from type std::condition_variable is ticking during sleep mode while the boost based time of the keep alive task is idle in this time. If so, perhaps you can suggest me a way to resolve it ?
Thanks
