ARTICLE AD BOX
We are currently developing Cronos Browser, a Chromium fork that integrates a local LLM (UIKI) for offline assistance.
We are noticing that keeping the model loaded in VRAM affects the rendering process of heavy WebGL tabs. We are currently using a shared memory buffer between the inference process and the renderer.
Question: Has anyone successfully implemented a "lazy unloading" strategy for WebGPU contexts in Chromium C++ source code to free up VRAM when the browser tab needs priority? We are looking into gpu::SharedImageInterface but documentation is scarce.
