Skip to content

Optimize (?!) in regular expressions #106566

@serhiy-storchaka

Description

@serhiy-storchaka

Some regular expression engines support (*FAIL) as a pattern which fails to match anything. (?!) is an idiomatic way to write this in engines which do not support (*FAIL).

It works pretty well, but it can be optimized. Instead of compiling it as ASSERT_NOT opcode

>>> re.compile(r'12(?!)|3', re.DEBUG)
BRANCH
  LITERAL 49
  LITERAL 50
  ASSERT_NOT 1
OR
  LITERAL 51

 0. INFO 9 0b100 1 2 (to 10)
      in
 5.     LITERAL 0x31 ('1')
 7.     LITERAL 0x33 ('3')
 9.     FAILURE
10: BRANCH 11 (to 22)
12.   LITERAL 0x31 ('1')
14.   LITERAL 0x32 ('2')
16.   ASSERT_NOT 3 0 (to 20)
19.     SUCCESS
20:   JUMP 7 (to 28)
22: branch 5 (to 27)
23.   LITERAL 0x33 ('3')
25.   JUMP 2 (to 28)
27: FAILURE
28: SUCCESS
re.compile('12(?!)|3', re.DEBUG)

it can be compiled as FAILURE opcode.

>>> re.compile(r'12(?!)|3', re.DEBUG)
BRANCH
  LITERAL 49
  LITERAL 50
  FAILURE
OR
  LITERAL 51

 0. INFO 9 0b100 1 2 (to 10)
      in
 5.     LITERAL 0x31 ('1')
 7.     LITERAL 0x33 ('3')
 9.     FAILURE
10: BRANCH 8 (to 19)
12.   LITERAL 0x31 ('1')
14.   LITERAL 0x32 ('2')
16.   FAILURE
17.   JUMP 7 (to 25)
19: branch 5 (to 24)
20.   LITERAL 0x33 ('3')
22.   JUMP 2 (to 25)
24: FAILURE
25: SUCCESS
re.compile('12(?!)|3', re.DEBUG)

Unfortunately I do not know good examples of using (*FAIL) in regular expressions (without using (*SKIP)) to include them in the documentation. Perhaps other patterns of using (*FAIL) could be optimized future, but I do not know what to optimize.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions