reEngine Usage - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Before instantiating the reEngine, users have to pre-compile their regular expression using the software compiler provided in L1 first to check if the pattern is supported by the current implementation of hardware VM. The compiler will give an error code XF_UNSUPPORTED_OPCODE if the pattern is not supported. A pass code ONIG_NORMAL along with the configurations (including instruction list, bit-set map etc.) will be given if the input is a valid pattern. Then, user should pass these configurations, the input messages, along with its corresponding lengths in bytes under the format defined above to the reEngineKernel to trigger the matching process. The reEngineKernel will automatically be responsible for splitting them to individual inputs and feeding them into the buffers required by L1 regex-VM, also collecting the results after the matching process done and placing them into output result buffer correspondingly.

As mentioned in Regular Expression Virtual Machine (regex-VM), the bit-set map, instruction buffer, message buffer, and offset buffer are needed in hardware VM, and are not handled by itself. We think it is not necessary for users to deal with the details related to L1 at L2. Thus, these buffers will be automatically allocated in reEngineKernel according to the template parameters given by users.

Code Example

The following section gives a usage example for using reEngine in C++ based HLS design.

Firstly, let me introduce the format of three buffer as the inputs of reEngineKernel here:

cfg_buff

Configuration Buffer Format

msg_buff

Message Buffer Format

len_buff

Length Buffer Format

To use the regex-VM you need to:

  1. Compile the software regular expression compiler by running make command in path L1/tests/text/regex_vm/re_compile
  2. Include the xf_re_compile.h header in path L1/include/sw/xf_data_analytics/text and the oniguruma.h header in path L1/tests/text/regex_vm/re_compile/lib/include
#include "oniguruma.h"
#include "xf_re_compile.h"
  1. Compile your regular expression by calling xf_re_compile
// Number of instructions tranlated from the pattern
unsigned int instr_num = 0;
// Number of character classes in the pattern
unsigned int cclass_num = 0;
// Number of capturing groups in the pattern
unsigned int cpgp_num = 0;
// Bit set map
unsigned int* bitset = new unsigned int[8 * CCLASS_NM];
// Configuration buffer
uint64_t* cfg_buff = aligned_alloc<uint64_t>(INSTRUC_SIZE);
// Suppose 1k bytes is long enough for names of each capturing group
uint8_t* cpgp_name_val = aligned_alloc<uint8_t>(1024);
// Suppose the number of capturing groups is less than 20
uint32_t* cpgp_name_offt = aligned_alloc<uint32_t>(20);
// Leave 2 64-bit space for configuration headers
int r = xf_re_compile(pattern, bitset, cfg_buff + 2, &instr_num, &cclass_num, &cpgp_num, cpgp_name_val, cpgp_name_offt);

// Print a name table for all of the capturing groups
printf("Name Table\n");
for (int i = 0; i < cpgp_num; i++) {
    printf("Group-%d: ", i);
    for (int j = 0; j < cpgp_name_offt[i + 1] - cpgp_name_offt[i]; j++) {
        printf("%c", cpgp_name_val[j + cpgp_name_offt[i]]);
    }
    printf("\n");
}
  1. Check the return value to see if its a valid pattern and supported by hardware VM. ONIG_NORMAL is returned if the pattern is valid, and XF_UNSUPPORTED_OPCODE is returned if it’s not supported currently.
if (r != XF_UNSUPPORTED_OPCODE && r == ONIG_NORMAL) {
    // Prepare the buffers and call reEngine for acceleration here
}
  1. Once the regular expression is verified as a supported pattern, you may prepare the input buffers and get the results by
// Header for reEngine
#include "re_engine_kernel.hpp"
// Header for reading log file as std::string
#include <iostream>
#include <fstream>
#include <string.h>

// Total number of configuration blocks
// leave 2 blocks for configuration header
unsigned int cfg_nm = 2 + instr_num;
// Message buffer (64-bit width for full utilizing the 2 memory ports of BRAMs)
uint64_t* msg_buff = aligned_alloc<uint64_t>(MAX_MSG_SZ);
// Length buffer
uint16_t* len_buff = aligned_alloc<uint16_t>(MAX_MSG_NM);
// Append bit-set map to the tail of instruction list
for (unsigned int i = 0; i < cclass_num * 4; i++) {
    uint64_t tmp = bitset[i * 2 + 1];
    tmp = tmp << 32;
    tmp += bitset[i * 2];
    cfg_buff[cfg_nm++] = tmp;
}
// Set configuration header accordingly
typedef union {
    struct {
        uint32_t instr_nm;
        uint16_t cc_nm;
        uint16_t gp_nm;
    } head_st;
    uint64_t d;
} cfg_info;
cfg_info cfg_h;
cfg_h.head_st.instr_nm = instr_num;
cfg_h.head_st.cc_nm = cclass_num;
cfg_h.head_st.gp_nm = cpgp_num;
cfg_buff[0] = cfg_nm;
cfg_buff[1] = cfg_h.d;
// String of each line in the log
std::string line;
// We provide a 5k line apache log
std::ifstream log_file(log_data/access_5k.log);
if (log_file.is_open()) {
    // Read the apache log line-by-line
    while (getline(log_file, line)) {
        if (line.size() > 0) {
            if (writeOneLine(msg_buff, len_buff, offt, msg_nm, line) != 0) {
                return -1;
            }
        }
    }
    // Set the header of message buffer (number of message blocks in 64-bit)
    msg_buff[0] = offt;
    // Set the header of length buffer (concatenate the first 2 blocks, it presents the total number of messages in msg_buff)
    len_buff[0] = msg_nm / 65536;
    len_buff[1] = msg_nm % 65536;
} else {
    printf("Opening input log file failed.\n");
    return -1;
}
// Result buffer
uint32_t* out_buff = aligned_alloc<uint32_t>((cpgp_num + 1) * msg_nm);
// Call reEngine
reEngineKernel(reinterpret_cast<ap_uint<64>*>(cfg_buff), reinterpret_cast<ap_uint<64>*>(msg_buff), reinterpret_cast<ap_uint<16>*>(len_buff), reinterpret_cast<ap_uint<32>*>(out_buff));

The match flag and offset addresses for each capturing group are presented in out_buff with the format shown in the figure below:

out_buff

Result Buffer Format