Skip to content

XS regular expression matching with empty disjunction alternatives is not always correct #1484

@gibson042

Description

@gibson042

Environment: XS 16.8.1, slot 32 bytes, ID 4 bytes

Description

Regular expression matching fails with some scenarios involving an empty alternative. I haven't exactly pinpointed the failure conditions, but have included some small reproductions.

This issue, while not critical, is more concerning than most because it manifests as non-throwing misbehavior of regular expression patterns that can reasonably be expected.

Also: AFAICT, RegExp.prototype.exec seems to not be affected.

Steps to Reproduce

(() => {
  const re1 = /<[^:>]*(:[^>]*|)>/g; // captures ":…" or "" (fails)
  const re2 = /<[^:>]*(:[^>]*)?>/g; // captures ":…" or `undefined` (works as expected)
  const str = "<foo> <bar> <foo:bar>";
  const matchResult1 = str.match(re1);
  const matchAllResult1 = str.matchAll(re1);
  const matchResult2 = str.match(re2);
  const matchAllResult2 = str.matchAll(re2);

  return {
    matchResult1,
    matchAllResult1: [...(matchAllResult1 ?? [matchAllResult1])],
    matchResult2,
    matchAllResult2: [...(matchAllResult2 ?? [matchAllResult2])],
  };
})();

Actual behavior

{
  "matchResult1": ["<foo:bar>"],
  "matchAllResult1": [
    ["<foo:bar>", ":bar"]
  ],
  "matchResult2": ["<foo>", "<bar>", "<foo:bar>"],
  "matchAllResult2": [
    ["<foo>", undefined],
    ["<bar>", undefined],
    ["<foo:bar>", ":bar"]
  ]
}

Expected behavior

{
  "matchResult1": ["<foo>", "<bar>", "<foo:bar>"],
  "matchAllResult1": [
    ["<foo>", ""],
    ["<bar>", ""],
    ["<foo:bar>", ":bar"]
  ],
  "matchResult2": ["<foo>", "<bar>", "<foo:bar>"],
  "matchAllResult2": [
    ["<foo>", undefined],
    ["<bar>", undefined],
    ["<foo:bar>", ":bar"]
  ]
}
diff
--- actual
+++ expected
 {
-  "matchResult1": ["<foo:bar>"],
+  "matchResult1": ["<foo>", "<bar>", "<foo:bar>"],
   "matchAllResult1": [
+    ["<foo>", ""],
+    ["<bar>", ""],
     ["<foo:bar>", ":bar"]
   ],
   "matchResult2": ["<foo>", "<bar>", "<foo:bar>"],
   "matchAllResult2": [
     ["<foo>", undefined],
     ["<bar>", undefined],
     ["<foo:bar>", ":bar"]
   ]
 }

The first regular expression (with a capture containing an empty alternative) should return matches equivalent to those of the second (with an optional capture).
This is basically driven by CompileSubpattern of Disjunction :: Alternative | Disjunction, which uses MatchTwoAlternatives to match with Alternative if possible and otherwise attempt to match with Disjunction (where Disjunction :: AlternativeAlternative :: [empty] uses EmptyMatcher to always continue).

Second reproduction

(() => {
  const re1 = /[^: ]+(:[^ ]*|)/g; // captures ":…" or "" (fails)
  const re2 = /[^: ]+(:[^ ]*)?/g; // captures ":…" or `undefined` (works as expected)
  const str = "foo bar foo:bar";
  const matchResult1 = str.match(re1);
  const matchAllResult1 = str.matchAll(re1);
  const matchResult2 = str.match(re2);
  const matchAllResult2 = str.matchAll(re2);

  return {
    matchResult1,
    matchAllResult1: [...(matchAllResult1 ?? [matchAllResult1])],
    matchResult2,
    matchAllResult2: [...(matchAllResult2 ?? [matchAllResult2])],
  };
})();

Actual behavior

{
  "matchResult1": ["f", "o", "o", "b", "a", "r", "foo:bar"],
  "matchAllResult1": [
    ["f", ""],
    ["o", ""],
    ["o", ""],
    ["b", ""],
    ["a", ""],
    ["r", ""],
    ["foo:bar", ":bar"]
  ],
  "matchResult2": ["foo", "bar", "foo:bar"],
  "matchAllResult2": [
    ["foo", undefined],
    ["bar", undefined],
    ["foo:bar", ":bar"]
  ]
}

Expected behavior

The first regular expression (with a capture containing an empty alternative) should return matches equivalent to those of the second (with an optional capture).

{
  "matchResult1": ["foo", "bar", "foo:bar"],
  "matchAllResult1": [
    ["foo", ""],
    ["bar", ""],
    ["foo:bar", ":bar"]
  ],
  "matchResult2": ["foo", "bar", "foo:bar"],
  "matchAllResult2": [
    ["foo", undefined],
    ["bar", undefined],
    ["foo:bar", ":bar"]
  ]
}
diff
--- actual
+++ expected
 {
-  "matchResult1": ["f", "o", "o", "b", "a", "r", "foo:bar"],
+  "matchResult1": ["foo", "bar", "foo:bar"],
   "matchAllResult1": [
-    ["f", ""],
-    ["o", ""],
-    ["o", ""],
-    ["b", ""],
-    ["a", ""],
-    ["r", ""],
+    ["foo", ""],
+    ["bar", ""],
     ["foo:bar", ":bar"]
   ],
   "matchResult2": ["foo", "bar", "foo:bar"],
   "matchAllResult2": [
     ["foo", undefined],
     ["bar", undefined],
     ["foo:bar", ":bar"]
   ]
 }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy