ARTICLE AD BOX
I am currently deploying a Large Language Model (e.g., Llama 3 / Mistral) for a medical application, specifically for tasks such as clinical note summarization and extracting information from oncology reports.
In a clinical setting, factual accuracy and consistency are far more critical than linguistic creativity. I am looking for advice on how to optimize the GenerationConfig to ensure the safest possible output.
Specifically, I have the following questions:
Temperature & Top-p: Is it standard practice to set temperature to a very low value (e.g., 0.1 or even 0) to maximize determinism, or does this lead to repetitive/degraded output in medical contexts?
Penalty Parameters: How should I balance repetition_penalty and presence_penalty to avoid omitting crucial medical symptoms while preventing the model from getting stuck in loops?
Any insights or papers regarding parameter tuning for high-stakes domain-specific LLMs would be greatly appreciated.
