Recommended GenerationConfig for Medical Domain LLMs: Strategies to Minimize Hallucination and Ensure Factuality

1 day ago 1
ARTICLE AD BOX

I am currently deploying a Large Language Model (e.g., Llama 3 / Mistral) for a medical application, specifically for tasks such as clinical note summarization and extracting information from oncology reports.

In a clinical setting, factual accuracy and consistency are far more critical than linguistic creativity. I am looking for advice on how to optimize the GenerationConfig to ensure the safest possible output.

Specifically, I have the following questions:

Temperature & Top-p: Is it standard practice to set temperature to a very low value (e.g., 0.1 or even 0) to maximize determinism, or does this lead to repetitive/degraded output in medical contexts?

Penalty Parameters: How should I balance repetition_penalty and presence_penalty to avoid omitting crucial medical symptoms while preventing the model from getting stuck in loops?

Any insights or papers regarding parameter tuning for high-stakes domain-specific LLMs would be greatly appreciated.

Read Entire Article