Regex-VM Coverage - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Due to limited resources and a strict release timeline, in current release, we only provide the most frequently used OPs in the whole OP list given by Oniguruma library to support common regular expressions. This can be shown in the following table:

OP list Supported
FINISH NO
END YES
STR_1 YES
STR_2 YES
STR_3 YES
STR_4 YES
STR_5 YES
STR_N YES
STR_MB2N1 NO
STR_MB2N2 NO
STR_MB2N3 NO
STR_MB2N NO
STR_MB3N NO
STR_MBN NO
CCLASS YES
CCLASS_MB NO
CCLASS_MIX NO
CCLASS_NOT YES
CCLASS_MB_NOT NO
CCLASS_MIX_NOT NO
ANYCHAR YES
ANYCHAR_ML NO
ANYCHAR_STAR YES
ANYCHAR_ML_STAR NO
ANYCHAR_STAR_PEEK_NEXT NO
ANYCHAR_ML_STAR_PEEK_NEXT NO
WORD NO
WORD_ASCII NO
NO_WROD NO
NO_WORD_ASCII NO
WORD_BOUNDARY NO
NO_WORD_BOUNDARY NO
WORD_BEGIN NO
WORD_END NO
TEXT_SEGMENT_BOUNDARY NO
BEGIN_BUF YES
END_BUF YES
BEGIN_LINE YES
END_LINE YES
SEMI_END_BUF NO
CHECK_POSITION NO
BACKREF1 NO
BACKREF2 NO
BACKREF_N NO
BACKREF_N_IC NO
BACKREF_MULTI NO
BACKREF_MULTI_IC NO
BACKREF_WITH_LEVEL NO
BACKREF_WITH_LEVEL_IC NO
BACKREF_CHECK NO
BACKREF_CHECK_WITH_LEVEL NO
MEM_START YES
MEM_START_PUSH YES
MEM_END_PUSH NO
MEM_END_PUSH_REC NO
MEM_END YES
MEM_END_REC NO
FAIL YES
JUMP YES
PUSH YES
PUSH_SUPER NO
POP YES
POP_TO_MARK YES
PUSH_OR_JUMP_EXACT1 YES
PUSH_IF_PEEK_NEXT NO
REPEAT YES
REPEAT_NG NO
REPEAT_INC YES
REPEAT_INC_NG NO
EMPTY_CHECK_START NO
EMPTY_CHECK_END NO
EMPTY_CHECK_END_MEMST NO
EMPTY_CHECK_END_MEMST_PUSH NO
MOVE NO
STEP_BACK_START YES
STEP_BACK_NEXT NO
CUT_TO_MARK NO
MARK YES
SAVE_VAL NO
UPDATE_VAR NO
CALL NO
RETURN NO
CALLOUT_CONTECTS NO
CALLOUT_NAME NO

Therefore, the supported atomic regular expressions and their corresponding descriptions should be:

Regex Description
^ asserts position at start of a line
$ asserts position at the end of a line
\A asserts position at start of the string
\z asserts position at the end of the string
\ca matches the control sequence CTRL+a
\C matches one data unit, even in UTF mode (best avoided)
\c\\ matches the control sequence CTRL+\
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\d matches a digit (equal to [0-9])
\D matches any character that’s not a digit (equal to [^0-9])
\h matches any horizontal whitespace character (equal to [[:blank:]])
\H matches any character that’s not a horizontal whitespace character
\w matches any word character (equal to [a-zA-Z0-9_])
\W matches any non-word character (equal to [^a-zA-Z0-9_])
\^ matches the character ^ literally
\$ matches the character $ literally
\N matches any non-newline character
\g'0' recurses the 0th subpattern
\o{101} matches the character A with index with 101(oct)
\x61 matches the character a (hex 61) literally
\x{1 2} matches 1 (hex) or 2 (hex)
\17 matches the character oct 17 literally
abc matches the abc literally
. matches any character (except for line terminators)
| alternative
[^a] match a single character not present in the list below
[a-c] matches a, b, or c
[abc] matches a, b, or c
[:upper:] matches a uppercase letter [A-Z]
a? matches the a zero or one time (greedy)
a* matches a between zero and unlimited times (greedy)
a+ matches a between one and unlimited times (greedy)
a?? matches a between zero and one times (lazy)
a*? matches a between zero and unlimited times (lazy)
a+? matches a between one and unlimited times (lazy)
a{2} matches a exactly 2 times
a{0,} matches a between zero and unlimited times
a{1,2} matches a one or two times
{,} matches {,} literally
(?#blabla) comment blabla
(a) capturing group, matches a literally
(?<name1> a) named capturing group name1, matches a literally
(?:) non-capturing group
(?i) match the remainder of the pattern with the following effective flags: gmi (i modifier: insensitive)
(?<!a)z matches any occurrence of z that is not preceded by a (negative look-behind)
z(?!a) match any occurrence of z that is not followed by a (negative look-ahead)

Attention

  1. Supported encoding method in current release is ASCII (extended ASCII codes are excluded).
  2. Nested repetition is not supported