Memory management strategies for local LLM inference in a Chromium-based browser fork [closed]

1 week ago 12
ARTICLE AD BOX

We are currently developing Cronos Browser, a Chromium fork that integrates a local LLM (UIKI) for offline assistance.

We are noticing that keeping the model loaded in VRAM affects the rendering process of heavy WebGL tabs. We are currently using a shared memory buffer between the inference process and the renderer.

Question: Has anyone successfully implemented a "lazy unloading" strategy for WebGPU contexts in Chromium C++ source code to free up VRAM when the browser tab needs priority? We are looking into gpu::SharedImageInterface but documentation is scarce.

Read Entire Article