ARTICLE AD BOX
tl;dr
The problem is from the Hugging Face transformers tokenizer, not from MLX. The bit:
apply_chat_template(..., tokenize=False)already solves it.
Details
In plain terms: the tokenizer repo you're loading has a known 'bad regex' in its configuration, and transformers can apply a built-in fix if you pass fix_mistral_regex=True when constructing the tokenizer.
Why your current code can't "just set the flag"
mlx_lm.load(...) is a convenience helper that typically loads both the model and tokenizer internally. Your code gets a tokenizer object back after it was already constructed-so there's nowhere to pass that flag.
So the clean solution, in my opinion, would be: let mlx_lm.load() load only the model, and load the tokenizer yourself via transformers.AutoTokenizer.from_pretrained(..., fix_mistral_regex=True). That's exactly the workaround people are using.
what I'd do
Do not ignore the warning if you care about correctness. Tokenization bugs are silent and brutal (you'll get weird generation, broken chat templates, or "mishe mishe, tauf tauf" answers). Keep MLX for the model runtime (usually faster), but use transformers for tokenizer config edge cases like this. Set tokenize=False when using apply_chat_template, because you want a text prompt string to feed into mlx_lm.generate (not token IDs; I assume).Copy-Paste fix:
from mlx_lm import load, generate from transformers import AutoTokenizer MODEL_ID: str = "mlx-community/translategemma-12b-it-4bit" # Load the model with MLX-LM (ignore its tokenizer output) out = load(MODEL_ID) if len(out) == 2: model, _tokenizer = out else: model, _tokenizer, _struct = out # Load tokenizer explicitly with the fix flag tokenizer = AutoTokenizer.from_pretrained( MODEL_ID, fix_mistral_regex=True, # <-- Fix here ) prompt: str = "Write a story about Einstein" messages: list[dict[str, str]] = [{"role": "user", "content": prompt}] # Build the chat-formatted prompt as TEXT (not tokens) prompt_text = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=False, # <-- False here ) # Generate text = generate(model, tokenizer, prompt=prompt_text, verbose=True) print(text)Why tokenize=False matters here
mlx_lm.generate(...) expects a str prompt (unless you're using lower-level token APIs). apply_chat_template(..., tokenize=False) ensures you get the rendered chat prompt string exactly as the model expects.Mistral-Warning:
If/when you switch to a Mistral tokenizer repo (like the one in your warning), do the exact same thing-load model via MLX, tokenizer via AutoTokenizer(..., fix_mistral_regex=True). The fix flag is specifically about that family's tokenizer regex config.
Note:
Sometimes that warning appears even when you're not intentionally using Mistral, because your model repo may bundle tokenizer assets that trigger the check. In that situation, setting the flag is harmless and shuts the warning up, and-more importantly-keeps tokenization deterministic.
If you paste the exact model ID that emits the warning (the one you're loading on MLX right now), I can tell you whether it's truly Mistral-tokenizer-based or just "warning noise," and what the safest config is.
