How does word boundary restriction deal with empty string at the end? [duplicate]

1 day ago 1
ARTICLE AD BOX

\b is a word boundary (so a begin or end of word).

\b matches between two characters, and does not consume any char.

It matches a changement from \w (word) to \W (not word) or from \W (not word) to \w (word), but also if it is at the begin or end of string. So it also matches between ^ and \w or between \w and $.

As soon as you use the g modifier, the regex engine will keep on searching for the pattern until it reaches the end of the string. The steps:

The regex engine first matches "a" and consumes this character. If the string would have been "Zara", we would match "a" at the end of the string.

The engine continues searching from this position (at the end of the string). There isn't any char left, but as a* means 0 or N times "a", it matches "", which effectively is followed by the end of a word. So this is why you get this second match, which is an empty string.

Proposition

As I mentionned in the comment below, if you don't want to match this ending empty string, you have to replace a* (0 or N times) by a+ (1 or N times).

Demonstration by testing

If you input string contains words not starting with "a", you'll even have several other empty string matchs.

See it in action with these tests:

const inputs = [ "a", "Sara", "Zara, tada!" ]; const patterns = [ /a*\b/g, // 0 or N times "a", followed by a word boundary. /a+\b/g // 1 or N times "a", followed by a word boundary. ]; for (const input of inputs) { for (const pattern of patterns) { console.log(`Execute ${pattern.toString()} on "${input}":\n`); const matches = [...input.matchAll(pattern)]; for (const match of matches) { console.log(`\tIndex: ${match.index}\tValue: "${match[0]}"`); } } }
Read Entire Article