Overview - 2.6 English

Binary CAM Search v2.6 LogiCORE IP Product Guide (PG317)

Document ID
PG317
Release Date
2023-11-01
Version
2.6 English
The Binary CAM Search IP core (BCAM) is a member of the family of CAMs provided by AMD. The family consists of four members:
Binary CAM (BCAM)
Described in this document. Used for exact matching. Entry storage is provided in UltraRAM or block RAM.
Hardware-managed binary CAM (CBCAM)
Described in this document. Entries can be inserted/deleted without a software driver using hardware interface. Software managed inserts/deletes are also supported. The CBCAM software driver has no shadow memory of the CAM contents, and requires less memory and CPU resources at the cost of additional logic resources.
Semi TCAM (STCAM)
The STCAM is fully flexible in terms of number, size and position of wildcard (don't care) fields. Every key bit has a corresponding mask bit. The number of allowed unique masks is however limited. This allows for considerable memory and logic optimizations. See the Semi-Ternary CAM Search LogiCORE IP Product Guide (PG319).
Ternary CAM (TCAM)
The primary usage of TCAM is for tables requiring full flexibility in terms of size and position of wildcard (don't care) fields. Every key bit has a corresponding mask bit stored together with the key. All entries can have different masks. TCAMs are used for Access Control List (ACL) type lookups, requiring a large number of different masks. See the Ternary CAM Search LogiCORE IP Product Guide (PG318).
One or multiple instances of each type can be used inside the same FPGA. Different types can also be mixed inside the same FPGA. Each CAM type is optimized for its specific task in terms of hardware resource usage and can be flexibly configured using VitisNetP4 or the IP integrator.

The BCAM stores {key, response} entries in either URAM or block RAM. The BCAM provides efficient use of FPGA resources, in contrast with basic BCAM implementations that store the keys in flip-flops and use logic resources for parallel key comparison.

The Lookup interface of the BCAM receives a lookup key and outputs a result that contains a match flag indicating whether the lookup key matches the key of any entry in the BCAM. If any BCAM entry is matched, the response value of the matching entry is output. The BCAM is pipelined so that it can process a Lookup Request every clock cycle.

The entries are read and written using a set of high-level API functions. The API functions are written in C and delivered as part of the IP. The API encapsulates the details of memory management and register access and provides a simple and efficient management interface. The API software with detailed documentation is found in the CAM IP product page. You provide the functions for basic hardware reads and writes to the API. This allows for flexible hardware mapping and the communications link between the API software and the hardware is designed to your specifications. The communication link could for instance be AXI4-Lite or PCIe. The CBCAM software driver also supports inserts, updates and deletes, yet the memory management in this case is performed by hardware, thus off-loading a lot of the software activity from the CPU.

The BCAM design is highly configurable at compile time to make it suitable for a large variety of applications. The following table lists the configuration parameters.

Table 1. Configuration Parameters
Parameter Name Valid Range Description
KEY_WIDTH 10-992 bits The width of the lookup key.
RESPONSE_WIDTH 1-1024 bits The width of the lookup response.
NUM_ENTRIES 1 - 1M The number of usable entries (depth). To generate a BCAM with a certain memory depth, for example 4K, specify 95% of the target: NUM_ENTRIES = 0.95 x 4096 = 3891
MEMORY_PRIMITIVE BLOCK or ULTRA or AUTO The compiler selects the best suited type automatically. This can however be overridden as a user preference.
LOOKUP_RATE 15 - 600 Mlps This is the supported lookup rate of the instance (expressed in million lookups per second). In order to save resources it is important not to set the lookup rate higher than required.
LOOKUP_INTERFACE_FREQ 15-600 MHz This is the clock frequency of the Lookup Request and response interfaces.

LOOKUP_INTERFACE_FREQ >= LOOKUP_RATE

RAM_FREQ 15-600 MHz This is the clock frequency of the memories and the internal datapath. An optional, high frequency RAM clock enables time division of the hardware resources, leading to significant savings. See the TDM_FACTOR parameter.

RAM_FREQ >= LOOKUP_INTERFACE_FREQ

TDM_FACTOR 1, 2, or 4 The TDM_FACTOR is calculated from the ratio:

RAM_FREQ / LOOKUP_RATE

The ratio is rounded downwards to the nearest power of two and capped based on NUM_ENTRIES. This further described in Resource Time Sharing.

CLOCKING_MODE SINGLE-CLOCK or DUAL_CLOCK The use of a separate RAM clock is optional. If RAM_FREQ = LOOKUP_INTERFACE_FREQ, then the single clock mode is enabled. In single clock mode only the lookup interface clock is used for lookup interfaces, RAM and match logic.

All of these parameters are extracted from the P4 code and VitisNetP4 tool during compilation. If the BCAM is used without P4, these parameters are set in the IP Generator prior to generating the BCAM hardware and software BCAM API. VitisNetP4 ensures that the parameters used to generate the hardware BCAM and those used to create the software BCAM are synchronized. The BCAM and CBCAM handles parameters differently. The CBCAM keeps all parameters in a hardware database. This database is read by the software CBCAM driver. This way, software and hardware are synchronized. The BCAM keeps a few parameters in a HW database. These parameters are read by the software BCAM driver and verified with the drivers own parameters. The parameters must match otherwise the driver will return an error code. For standalone usage, you must guarantee that the parameters used to generate the hardware BCAM, and the parameters used to call the software BCAM are identical. VitisNetP4 generation of CBCAM is not supported in version 2.6.

CBCAM can only be used standalone. For CBCAM, the parameters do not need to be identical because they are read from the hardware.