Building watchdog for boost io_context queue

1 week ago 8
ARTICLE AD BOX

I want to set up a watchdog that checks whether the io_context workers can pick up tasks within a reasonable time and are not stuck running long or blocking operations.

To achieve this, I've implemented a check that verifies the io_context queue is functioning properly by scheduling a task every 30 seconds. This task simply sets a flag to true so we can confirm that the queue is still responsive.

using PeriodicTask = BasicScheduledTask<boost::asio::steady_timer, true>; std::shared_ptr<PeriodicTask> io_context_alive_task_ = std::make_shared<PeriodicTask> (io_context_, [this](const auto& ec) { if (ec) { print_error("Could not report io_context as alive: {}", ec.message()); return; } print_debug("Marking io_context as alive"); is_context_alive_ = true; }, 30s));

The watchdog runs in its own independent thread outside io_context and checks every 2 minutes whether the flag has been set to true.

std::unique_ptr<std::thread> context_watchdog_ = std::make_unique<std::thread>([this] { while (!io_context_.stopped()) { io_context_alive_cv_.wait_for(lock, 2min, [this] { return io_context_.stopped(); }); if (!is_context_alive_) { print_critical("io_context is not responding"); std::abort(); } print_debug("io_context is ok, setting back to false"); is_context_alive_ = false; } print_debug("io_context stopped. stopping thread"); });

I've notice that I get some false alarms when the system wakes from sleep.

This happens since the 30 second periodic task that marks the context as alive does not run for more than 2 minutes. As a result, the watchdog assumes the io_context is unresponsive and attempts to abort the service.

I wonder if the io_context_alive_cv_ which is from type std::condition_variable is ticking during sleep mode while the boost based time of the keep alive task is idle in this time. If so, perhaps you can suggest me a way to resolve it ?

Thanks

Read Entire Article