/\d+(?=%)/.exec("100% of US presidents have been male") // ["100"] /\d+(?!%)/.exec("that’s all 44 of them") // ["44"]
/(?<=\$)\d+/.exec("Benjamin Franklin is on the $100 bill") // ["100"] /(?<!\$)\d+/.exec("it’s is worth about €90") // ["90"]
Generally, there are two ways to implement lookbehind assertions. Perl, for example, requires lookbehind patterns to have a fixed length. That means that quantifiers such as * or + are not allowed. This way, the regular expression engine can step back by that fixed length, and match the lookbehind the exact same way as it would match a lookahead, from the stepped back position.
The regular expression engine in the .NET framework takes a different approach. Instead of needing to know how many characters the lookbehind pattern will match, it simply matches the lookbehind pattern backwards, while reading characters against the normal read direction. This means that the lookbehind pattern can take advantage of the full regular expression syntax and match patterns of arbitrary length.
Because lookbehind assertions match backwards, there are some subtle behaviors that would otherwise be considered surprising. For example, a capturing group with a quantifier captures the last match. Usually, that is the right-most match. But inside a lookbehind assertion, we match from right to left, therefore the left-most match is captured:
/h(?=(\w)+)/.exec("hodor") // ["h", "r"] /(?<=(\w)+)r/.exec("hodor") // ["r", "h"]
A capturing group can be referenced via back reference after it has been captured. Usually, the back reference has to be to the right of the capture group. Otherwise, it would match the empty string, as nothing has been captured yet. However, inside a lookbehind assertion, the match direction is reversed:
/(?<=(o)d\1)r/.exec("hodor") // null /(?<=\1d(o))r/.exec("hodor") // ["r", "o"]
Yang Guo, Regular Expression Engineer