-
Notifications
You must be signed in to change notification settings - Fork 244
Description
Environment: XS 16.8.1, slot 32 bytes, ID 4 bytes
Description
Regular expression matching fails with some scenarios involving an empty alternative. I haven't exactly pinpointed the failure conditions, but have included some small reproductions.
This issue, while not critical, is more concerning than most because it manifests as non-throwing misbehavior of regular expression patterns that can reasonably be expected.
Also: AFAICT, RegExp.prototype.exec
seems to not be affected.
Steps to Reproduce
(() => {
const re1 = /<[^:>]*(:[^>]*|)>/g; // captures ":…" or "" (fails)
const re2 = /<[^:>]*(:[^>]*)?>/g; // captures ":…" or `undefined` (works as expected)
const str = "<foo> <bar> <foo:bar>";
const matchResult1 = str.match(re1);
const matchAllResult1 = str.matchAll(re1);
const matchResult2 = str.match(re2);
const matchAllResult2 = str.matchAll(re2);
return {
matchResult1,
matchAllResult1: [...(matchAllResult1 ?? [matchAllResult1])],
matchResult2,
matchAllResult2: [...(matchAllResult2 ?? [matchAllResult2])],
};
})();
Actual behavior
{
"matchResult1": ["<foo:bar>"],
"matchAllResult1": [
["<foo:bar>", ":bar"]
],
"matchResult2": ["<foo>", "<bar>", "<foo:bar>"],
"matchAllResult2": [
["<foo>", undefined],
["<bar>", undefined],
["<foo:bar>", ":bar"]
]
}
Expected behavior
{
"matchResult1": ["<foo>", "<bar>", "<foo:bar>"],
"matchAllResult1": [
["<foo>", ""],
["<bar>", ""],
["<foo:bar>", ":bar"]
],
"matchResult2": ["<foo>", "<bar>", "<foo:bar>"],
"matchAllResult2": [
["<foo>", undefined],
["<bar>", undefined],
["<foo:bar>", ":bar"]
]
}
diff
--- actual
+++ expected
{
- "matchResult1": ["<foo:bar>"],
+ "matchResult1": ["<foo>", "<bar>", "<foo:bar>"],
"matchAllResult1": [
+ ["<foo>", ""],
+ ["<bar>", ""],
["<foo:bar>", ":bar"]
],
"matchResult2": ["<foo>", "<bar>", "<foo:bar>"],
"matchAllResult2": [
["<foo>", undefined],
["<bar>", undefined],
["<foo:bar>", ":bar"]
]
}
The first regular expression (with a capture containing an empty alternative) should return matches equivalent to those of the second (with an optional capture).
This is basically driven by CompileSubpattern of Disjunction :: Alternative |
Disjunction, which uses MatchTwoAlternatives to match with Alternative if possible and otherwise attempt to match with Disjunction (where Disjunction :: Alternative → Alternative :: [empty] uses EmptyMatcher to always continue).
Second reproduction
(() => {
const re1 = /[^: ]+(:[^ ]*|)/g; // captures ":…" or "" (fails)
const re2 = /[^: ]+(:[^ ]*)?/g; // captures ":…" or `undefined` (works as expected)
const str = "foo bar foo:bar";
const matchResult1 = str.match(re1);
const matchAllResult1 = str.matchAll(re1);
const matchResult2 = str.match(re2);
const matchAllResult2 = str.matchAll(re2);
return {
matchResult1,
matchAllResult1: [...(matchAllResult1 ?? [matchAllResult1])],
matchResult2,
matchAllResult2: [...(matchAllResult2 ?? [matchAllResult2])],
};
})();
Actual behavior
{
"matchResult1": ["f", "o", "o", "b", "a", "r", "foo:bar"],
"matchAllResult1": [
["f", ""],
["o", ""],
["o", ""],
["b", ""],
["a", ""],
["r", ""],
["foo:bar", ":bar"]
],
"matchResult2": ["foo", "bar", "foo:bar"],
"matchAllResult2": [
["foo", undefined],
["bar", undefined],
["foo:bar", ":bar"]
]
}
Expected behavior
The first regular expression (with a capture containing an empty alternative) should return matches equivalent to those of the second (with an optional capture).
{
"matchResult1": ["foo", "bar", "foo:bar"],
"matchAllResult1": [
["foo", ""],
["bar", ""],
["foo:bar", ":bar"]
],
"matchResult2": ["foo", "bar", "foo:bar"],
"matchAllResult2": [
["foo", undefined],
["bar", undefined],
["foo:bar", ":bar"]
]
}
diff
--- actual
+++ expected
{
- "matchResult1": ["f", "o", "o", "b", "a", "r", "foo:bar"],
+ "matchResult1": ["foo", "bar", "foo:bar"],
"matchAllResult1": [
- ["f", ""],
- ["o", ""],
- ["o", ""],
- ["b", ""],
- ["a", ""],
- ["r", ""],
+ ["foo", ""],
+ ["bar", ""],
["foo:bar", ":bar"]
],
"matchResult2": ["foo", "bar", "foo:bar"],
"matchAllResult2": [
["foo", undefined],
["bar", undefined],
["foo:bar", ":bar"]
]
}