ARTICLE AD BOX
Given this pattern:
power[\s\u00A0]+shell|power|shellApplied to:
"PowerShell and power shell scripting"The regex matches "power" and "shell" separately even when the full phrase exists.
How can I structure the pattern so that:
The full phrase match is preferred Shorter token matches do not overlap parts of the phraseHere is the builder I'm using:
function buildSearchRegex(q) { q = (q || "").trim().replace(/\s+/g, " "); if (!q) return null; const tokens = q.split(" ").filter((t) => t.length >= 1); if (!tokens.length) return null; const escapedTokens = tokens.map(t => t.replace(/[.*+?^${}()|[\]\\]/g, "\\$&") ); const phrasePattern = escapedTokens.length >= 2 ? escapedTokens.join("[\\s\\u00A0]+") : null; const tokenPattern = escapedTokens.join("|"); const pattern = phrasePattern ? `${phrasePattern}|${tokenPattern}` : tokenPattern; return new RegExp(pattern, "gi"); }However, because the phrase and tokens are in the same alternation group:
phrase|token1|token2the token matches can sometimes occur before the phrase match in the text flow.
