Memory management strategies for local LLM inference in a Chromium-based browser fork [closed]

1 week ago 12

ARTICLE AD BOX

We are currently developing Cronos Browser, a Chromium fork that integrates a local LLM (UIKI) for offline assistance.

We are noticing that keeping the model loaded in VRAM affects the rendering process of heavy WebGL tabs. We are currently using a shared memory buffer between the inference process and the renderer.

Question: Has anyone successfully implemented a "lazy unloading" strategy for WebGPU contexts in Chromium C++ source code to free up VRAM when the browser tab needs priority? We are looking into gpu::SharedImageInterface but documentation is scarce.

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Memory management strategies for local LLM inference in a Chromium-based browser fork [closed]

ARTICLE AD BOX

Related

Cross-compiling for ARM64 using Visual Studio and WSL2 is generating x64 binary

Xcode giving Library not found errors [closed]

Why does this constexpr function run and still allocate memory at runtime? [closed]

LEFT SIDEBAR AD