HOG - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

The Histogram of Oriented Gradients (HOG) is a feature descriptor used in computer vision for the purpose of object detection. The feature descriptors produced from this approach is widely used in the pedestrian detection.

The technique counts the occurrences of gradient orientation in localized portions of an image. HOG is computed over a dense grid of uniformly spaced cells and normalized over overlapping blocks, for improved accuracy. The concept behind HOG is that the object appearance and shape within an image can be described by the distribution of intensity gradients or edge direction.

Both RGB and gray inputs are accepted to the function. In the RGB mode, gradients are computed for each plane separately, but the one with the higher magnitude is selected. With the configurations provided, the window dimensions are 64x128, block dimensions are 16x16.

API Syntax

template<int WIN_HEIGHT, int WIN_WIDTH, int WIN_STRIDE, int BLOCK_HEIGHT, int BLOCK_WIDTH, int CELL_HEIGHT, int CELL_WIDTH, int NOB, int DESC_SIZE, int IMG_COLOR, int OUTPUT_VARIANT, int SRC_T, int DST_T, int ROWS, int COLS, int NPC = XF_NPPC1, bool USE_URAM=false, int XFCVDEPTH_IN = _XFCVDEPTH_DEFAULT, int XFCVDEPTH_DESC = _XFCVDEPTH_DEFAULT>
void HOGDescriptor(xf::cv::Mat<SRC_T, ROWS, COLS, NPC, XFCVDEPTH_IN> &_in_mat, xf::cv::Mat<DST_T, 1, DESC_SIZE, NPC, XFCVDEPTH_DESC> &_desc_mat);

Parameter Descriptions

The following table describes the template parameters.

Table 584 Table . HOGDescriptor Template .. rubric:: Parameter Descriptions
Parameters Description
WIN_HEIGHT The number of pixel rows in the window. This must be a multiple of 8 and should not exceed the number of image rows.
WIN_WIDTH The number of pixel cols in the window. This must be a multiple of 8 and should not exceed the number of image columns.
WIN_STRIDE The pixel stride between two adjacent windows. It is fixed at 8.
BLOCK_HEIGHT Height of the block. It is fixed at 16.
BLOCK_WIDTH Width of the block. It is fixed at 16.
CELL_HEIGHT Number of rows in a cell. It is fixed at 8.
CELL_WIDTH Number of cols in a cell. It is fixed at 8.
NOB Number of histogram bins for a cell. It is fixed at 9
DESC_SIZE The size of the output descriptor.
IMG_COLOR The type of the image, set as either XF_GRAY or XF_RGB
OUTPUT_VARIE NT Must be either XF_HOG_RB or XF_HOG_NRB
SRC_T Input pixel type. Must be either XF_8UC1 or XF_8UC4, for gray and color respectively.
DST_T Output descriptor type. Must be XF_32UC1.
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed.
NPC Number of pixels to be processed per cycle; this function supports only XF_NPPC1 or 1 pixel per cycle operations.
USE_URAM Enable to map UltraRAM instead of BRAM for some storage structures.
XFCVDEPTH_IN Depth of the input image.
XFCVDEPTH_DESC Depth of the output image.

The following table describes the function parameters.

Table 585 Table . HOGDescriptor Parameter Description
Parameters Description
_in_mat Input image, of xf::cv::Mat type
_desc_mat Output descriptors, of xf::cv::Mat type

Where,

  • RB is repetitive blocks (descriptor data are written window wise)
  • NRB is non-repetitive blocks (descriptor data are written block wise, in order to reduce the number of writes).

Note: In the RB mode, the block data is written to the memory taking the overlap windows into consideration. In the NRB mode, the block data is written directly to the output stream without consideration of the window overlap. In the host side, the overlap must be taken care.

Resource Utilization

The following table shows the resource utilization of HOGDescriptor function for normal operation (1 pixel) mode as generated in Vivado HLS 2019.1 version tool for the part Xczu9eg-ffvb1156-1-i-es1 at 300 MHz to process an image of 1920x1080 resolution.

Table 586 Table . HOGDescriptor Function Resource Utilization Summary
Resource Utilization (at 300 MHz) of 1 pixel operation
NRB RB
Gray RGB Gray RGB
BRAM_18K 43 49 171 177
DSP48E 34 46 36 48
FF 15365 15823 15205 15663
LUT 12868 13267 13443 13848

The following table shows the resource utilization of HOGDescriptor function for normal operation (1 pixel) mode as generated in Vivado HLS 2019.1 version tool for the part xczu7ev-ffvc1156-2-e at 300 MHz to process an image of 1920x1080 resolution with UltraRAM enabled.

Table 587 Table . HOGDescriptor Function Resource Utilization Summary with UltraRAM enabled
Resource Utilization (at 300 MHz) of 1 pixel operation
NRB RB
Gray RGB Gray RGB
BRAM_18K 10 12 18 20
URAM 15 15 15 17
DSP48E 34 46 36 48
FF 17285 17917 18270 18871
LUT 12409 12861 12793 13961

Performance Estimate

The following table shows the performance estimates of HOGDescriptor() function for different configurations as generated in Vivado HLS 2019.1 version tool for the part Xczu9eg-ffvb1156-1-i-es1 to process an image of 1920x1080p resolution.

Table 588 Table . HOGDescriptor Function Performance Estimate Summary
Operating Mode Operating Frequency (MHz) Latency Estimate
Min (ms) Max (ms)
NRB-Gray 300 6.98 8.83
NRB-RGBA 300 6.98 8.83
RB-Gray 300 176.81 177
RB-RGBA 300 176.81 177

Deviations from OpenCV

Listed below are the deviations from the OpenCV:

  1. Border care

    The border care that OpenCV has taken in the gradient computation is BORDER_REFLECT_101, in which the border padding will be the neighboring pixels’ reflection. Whereas, in the Xilinx implementation, BORDER_CONSTANT (zero padding) was used for the border care.

  2. Gaussian weighing

    The Gaussian weights are multiplied on the pixels over the block, that is a block has 256 pixels, and each position of the block are multiplied with its corresponding Gaussian weights. Whereas, in the HLS implementation, gaussian weighing was not performed.

  3. Cell-wise interpolation The magnitude values of the pixels are distributed across different cells in the blocks but on the corresponding bins. image86 Pixels in the region 1 belong only to its corresponding cells, but the pixels in region 2 and 3 are interpolated to the adjacent 2 cells and 4 cells respectively. This operation was not performed in the HLS implementation.

  4. Output handling

    The output of the OpenCV will be in the column major form. In the HLS implementation, output will be in the row major form. Also, the feature vector will be in the fixed point type Q0.16 in the HLS implementation, while in the OpenCV it will be in floating point.

Limitations

  1. The configurations are limited to Dalal’s implementation
  2. Image height and image width must be a multiple of cell height and cell width respectively.