RAG optimization

20 hours ago 3
ARTICLE AD BOX

I'm not very experienced with optimization and high performance, but I was recently tasked with speeding up the entire RAG process. I'd like some help to know if I'm on the right track and if the objectives are realistic.

The basic input would be a question that is vectorized, then we search for the most relevant area of ​​knowledge using vector distance, then we select the best experts from the database to answer the question, then we select their most relevant memories in relation to the question, and finally, the final answer is generated through an LLM.

The whole process was taking about 2 minutes per question, which was unfeasible. With some query optimizations with indexing and a change to a faster model, we managed to get it down to around 15 seconds.

I really don't know what else can be improved in the process without removing the layers of abstraction. So many database searches for just one question take considerably longer, and this is compounded by the time it takes to generate the answer using the LLM.

If anyone has any suggestions on what to do, what to study, or what's feasible, I would really appreciate it; I'm at a loss for what to do.

Read Entire Article