ES2018: RegExp lookbehind assertions

[2017-05-16] dev, javascript, esnext, es2018, regexp
(Ad, please don’t block)

The proposal “RegExp Lookbehind Assertions” by Gorkem Yakin, Nozomu Katō, Daniel Ehrenberg is part of ES2018. This blog post explains it.

A lookaround assertion is a construct inside a regular expression that specifies what the surroundings of the current location must look like, but has no other effect. It is also called a zero-width assertion.

The only lookaround assertion currently supported by JavaScript is the lookahead assertion, which matches what follows the current location. This blog post describes a proposal for a lookbehind assertion, which matches what precedes the current location.

Lookahead assertions  

A lookahead assertion inside a regular expression means: whatever comes next must match the assertion, but nothing else happens. That is, nothing is captured and the assertion doesn’t contribute to the overall matched string.

Take, for example, the following regular expression

const RE_AS_BS = /aa(?=bb)/;

It matches the string 'aabb', but the overall matched string does not include the b’s:

const match1 = RE_AS_BS.exec('aabb');
console.log(match1[0]); // 'aa'

Furthermore, it does not match a string that doesn’t have two b’s:

const match2 = RE_AS_BS.exec('aab');
console.log(match2); // null

A negative lookahead assertion means that what comes next must not match the assertion. For example:

> const RE_AS_NO_BS = /aa(?!bb)/;
> RE_AS_NO_BS.test('aabb')
false
> RE_AS_NO_BS.test('aab')
true
> RE_AS_NO_BS.test('aac')
true

Lookbehind assertions  

Lookbehind assertions work like lookahead assertions, but in the opposite direction.

Positive lookbehind assertions  

For a positive lookbehind assertion, the text preceding the current location must match the assertion (but nothing else happens).

const RE_DOLLAR_PREFIX = /(?<=\$)foo/g;
'$foo %foo foo'.replace(RE_DOLLAR_PREFIX, 'bar');
    // '$bar %foo foo'

As you can see, 'foo' is only replaced if it is preceded by a dollar sign. You can also see that the dollar sign is not part of the total match, because the latter is completely replaced by 'bar'.

Achieving the same result without a lookbehind assertion is less elegant:

const RE_DOLLAR_PREFIX = /(\$)foo/g;
'$foo %foo foo'.replace(RE_DOLLAR_PREFIX, '$1bar');
    // '$bar %foo foo'

And this approach doesn’t work if the prefix should be part of the previous match:

> 'a1ba2ba3b'.match(/(?<=b)a.b/g)
[ 'a2b', 'a3b' ]

Negative lookbehind assertions  

A negative lookbehind assertion only matches if the current location is not preceded by the assertion, but has no other effect. For example:

const RE_NO_DOLLAR_PREFIX = /(?<!\$)foo/g;
'$foo %foo foo'.replace(RE_NO_DOLLAR_PREFIX, 'bar');
    // '$foo %bar bar'

There is no simple (general) way to achieve the same result without a lookbehind assertion.

Conclusions  

Lookahead assertions make most sense at the end of regular expressions. Lookbehind assertions make most sense at the beginning of regular expressions.

The use cases for lookaround assertions are:

  • replace()
  • match() (especially if the regular expression has the flag /g)
  • split() (note the space at the beginning of ' b,c'):
    > 'a, b,c'.split(/,(?= )/)
    [ 'a', ' b,c' ]
    

Other than those use cases, you can just as well make the assertion a real part of the regular expression.

Further reading