Regex-VM Usage - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Before instantiating the hardware VM, users have to pre-compile their regular expression using the software compiler mentioned above first to check if the pattern is supported by the hardware VM. The compiler will give an error code XF_UNSUPPORTED_OPCODE if the pattern is not supported. A pass code ONIG_NORMAL along with the configurations (including instruction list, bit-set map etc.) will be given if the input is a valid pattern. Then, user should pass these configurations and the input message with its corresponding length in bytes to the hardware VM to trigger the matching process. The hardware VM will judge whether the input message is matched and provide the offset addresses for each capturing group in offset buffer.

It is important to be noticed that only the internal stack buffer is hold in hardware VM, user should allocate memories for bit-set map, instruction buffer, message buffer accordingly, and offset buffer respectively outside the hardware instantiation.

For the internal stack, its size is decided by the template parameter of the hardware VM. Since the storage resource it uses is URAM, the STACK_SIZE should better be set to be a multiple of 4096 for not wasting the space of individual URAM block. Moreover, it is critical to choose the internal stack size wisely as the hardware VM will overflow if the size is too small or no URAMs will be available on board for you to instantiate more PUs to improve the throughput.

Code Example

The following section gives a usage example for using regex-VM in C++ based HLS design.

To use the regex-VM you need to:

  1. Compile the software regular expression compiler by running make command in path L1/tests/text/regex_vm/re_compile
  2. Include the xf_re_compile.h header in path L1/include/sw/xf_data_analytics/text and the oniguruma.h header in path L1/tests/text/regex_vm/re_compile/lib/include
#include "oniguruma.h"
#include "xf_re_compile.h"
  1. Compile your regular expression by calling xf_re_compile
int r = xf_re_compile(pattern, bitset, instr_buff, instr_num, cclass_num, cpgp_num, NULL, NULL);
  1. Check the return value to see if its a valid pattern and supported by hardware VM. ONIG_NORMAL is returned if the pattern is valid, and XF_UNSUPPORTED_OPCODE is returned if it’s not supported currently.
if (r != XF_UNSUPPORTED_OPCODE && r == ONIG_NORMAL) {
    // calling hardware VM here for acceleration
}
  1. Once the regular expression is verified as a supported pattern, you may call hardware VM to match any message you want by
// for data types used in VM
#include "ap_int.h"
// header for hardware VM implementation
#include "xf_data_analytics/text/regexVM.hpp"

// allocate memory for bit-set map
unsigned int bitset[8 * cclass_num];
// allocate memory for instruction buffer (derived from software compiler)
uint64_t instr_buff[instr_num];
// allocate memory for message
ap_uint<32> msg_buff[MESSAGE_SIZE];
// set up input message buffer according to input string
unsigned str_len = strlen((const char*)in_str);
for (int i = 0; i < (str_len + 3) / 4;  i++) {
    for (int k = 0; k < 4; k++) {
        if (i * 4 + k < str_len) {
            msg_buff[i].range((k + 1) * 8 - 1, k * 8) = in_str[i * 4 + k];
        } else {
            // pad white-space at the end
            msg_buff[i].range((k + 1) * 8 - 1, k * 8) = ' ';
        }
    }
}
// allocate memory for offset addresses for each capturing group
uint16_t offset_buff[2 * (cpgp_num + 1)];
// initialize offset buffer
for (int i = 0; i < 2 * CAP_GRP_NUM; i++) {
    offset_buff[i] = -1;
}
ap_uint<2> match = 0;
// call for hardware acceleration (basic hardware VM implementation)
xf::data_analytics::text:regexVM<STACK_SIZE>((ap_uint<32>*)bitset, (ap_uint<64>*)instr_buff, msg_buff, str_len, match, offset_buff);
// or call for hardware acceleration (performance optimized hardware VM implementation)
xf::data_analytics::text:regexVM_opt<STACK_SIZE>((ap_uint<32>*)bitset, (ap_uint<64>*)instr_buff, msg_buff, str_len, match, offset_buff);

The match flag and offset addresses for each capturing group are presented in match and offset_buff respectively with the format shown in the tables below.

Truth table for the 2-bit output match flag of hardware VM:

Value Description
0 mismatched
1 matched
2 internal stack overflow
3 reserved for future use

Arrangement of the offset buffer offsetBuff:

Address Description
0 start position of the whole matched string
1 end position of the whole matched string
2 start position of the 1st capturing group
3 end position of the 1st capturing group
4 start position of the 2nd capturing group
5 end position of the 2nd capturing group