ARTICLE AD BOX
I’m passing a fairly large JSON (More than 2500 lines) payload to an LLM. It’s full of repeated fields (especially ranges repeated in each historic entry), metadata we don’t actually use, and verbose keys. This burns a lot of tokens.
Example (trimmed) input:
{ "pid": "TEST_1010", "biomarker": [ { "simpleLabel": "Estim. Avg Glu (eAG)", "ranges": { "fair": [70, 126], "optimal": [70, 97], "inRange": [70, 126], "good": [70, 108] }, "result": 110.69, "rawResult": 110.69, "type": "biomarker", "unit": "mg/dL", "originalUnit": "mg/dL", "label": "Estim. Avg Glu (eAG)", "shortLabel": "Estimated Average Glucose (eAG)", "targetType": "low", "infoLink": "", "category": { "name": "Metabolic Health", "weight": "n/a" }, "deviationScore": 36.33, "historic": [ { "result": 110.69, "rawResult": 110.69, "timestamp": "2026-01-28T08:23:07-05:00", "ranges": { "fair": [70,126], "optimal": [70,97], "inRange": [70,126], "good": [70,108] }, "reportType": "SPOT_REPORT", "kitType": "ORIGINAL", "spotID": "SB_SPOT6A4E89C8" }, { "result": 76.78, "rawResult": 76.78, "timestamp": "2026-01-28T08:24:17-05:00", "ranges": { "fair": [70,126], "optimal": [70,97], "inRange": [70,126], "good": [70,108] }, "reportType": "SPOT_REPORT", "kitType": "ORIGINAL", "spotID": "SB_SPOTB543809D" } ], "state": "default", "labRange": "inRange", "siphoxRange": "fair" } // ... many more biomarkers with similar structure ] }What I actually need the LLM to do:
Generate short summaries and flags per biomarker (e.g., whether result is outside optimal).
Optionally mention trend (up/down) using the last 1–2 historic points.
It doesn’t need full metadata like infoLink, reportType, kitType, spotID, category.weight, originalUnit, etc.
I’ve tried:
Manually removing unused fields.
Shortening keys.
Keeping only 2 historic points.
Still seeing lots of duplication (e.g., ranges repeated under each historic entry), and I think I can compress further safely.
What are robust patterns or transformation strategies to compact this JSON for LLM prompts while preserving the essential semantics?
