How do I prevent token matches from overlapping a longer phrase in JavaScript regex? [closed]

1 day ago 2
ARTICLE AD BOX

Given this pattern:

power[\s\u00A0]+shell|power|shell

Applied to:

"PowerShell and power shell scripting"

The regex matches "power" and "shell" separately even when the full phrase exists.

How can I structure the pattern so that:

The full phrase match is preferred Shorter token matches do not overlap parts of the phrase

Here is the builder I'm using:

function buildSearchRegex(q) { q = (q || "").trim().replace(/\s+/g, " "); if (!q) return null; const tokens = q.split(" ").filter((t) => t.length >= 1); if (!tokens.length) return null; const escapedTokens = tokens.map(t => t.replace(/[.*+?^${}()|[\]\\]/g, "\\$&") ); const phrasePattern = escapedTokens.length >= 2 ? escapedTokens.join("[\\s\\u00A0]+") : null; const tokenPattern = escapedTokens.join("|"); const pattern = phrasePattern ? `${phrasePattern}|${tokenPattern}` : tokenPattern; return new RegExp(pattern, "gi"); }

However, because the phrase and tokens are in the same alternation group:

phrase|token1|token2

the token matches can sometimes occur before the phrase match in the text flow.

Read Entire Article