Know What You Infer - 2023.2 English

Versal Adaptive SoC Hardware, IP, and Platform Development Methodology Guide (UG1387)

Document ID
UG1387
Release Date
2023-11-15
Version
2023.2 English

Your code finally has to map onto the resources present on the device. Make an effort to understand the key arithmetic, storage, and logic elements in the architecture you are targeting. Then, as you code the functionality of the design, anticipate the hardware resources to which the code will map. Understanding this mapping gives you an early insight into any potential problem.

The following examples demonstrate how understanding the hardware resources and mapping can help make certain design decisions:

  • For larger than 8-bit addition, subtraction, and add-sub, a carry chain is generally used and one LUT per 2-bit addition is used (that is, an 8-bit by 8-bit adder uses 8 LUTs and the associated carry chain). For ternary addition or in the case where the result of an adder is added to another value without the use of a register in between, two LUTs per 3-bit addition is used (that is, an 8-bit by 8-bit by 8-bit addition uses 16 LUTs and the associated carry chain).
  • In general, multiplication is targeted to DSP blocks. Signed bit widths of 27x24 or less and unsigned widths of 26x23 or less map into a single DSP Block. Multiplication requiring larger products might map into more than one DSP block. DSP blocks have pipelining resources inside them.

    Pipelining properly for logic inferred into the DSP block can greatly improve maximum clock frequency and reduce power. When a multiplication is described, three levels of pipelining around it generates best setup, clock-to-out, and power characteristics. Extremely light pipelining (one-level or none) might lead to timing issues and increased power for those blocks, while the pipelining registers within the DSP lie unused.

  • Two SRLs with depths of 16 bits or less can be mapped into a single LUT, and single SRLs up to 32 bits can also be mapped into a single LUT.
  • For conditional code resulting in standard MUX components:
    • A 4-to-1 MUX can be implemented into a single LUT, resulting in one logic level.
    • An 8-to-1 MUX can be implemented into three LUTs resulting in two logic (LUT) levels.
    • A 16-to-1 MUX can be implemented into five LUTs, resulting in effectively two logic (LUT) levels.

For general logic, take into account the number of unique inputs for a given register. From that number, an estimation of LUTs and logic levels can be achieved. In general, 6 inputs or fewer always results in a single logic level. Theoretically, two levels of logic can manage up to 36 inputs. However, for all practical purposes, you should assume that approximately 20 inputs is the maximum that can be managed with two levels of logic. In general, the larger the number of inputs and the more complex the logic equation, the more LUTs and logic levels are required.

Important: Check the availability of hardware resources and how efficiently they are being utilized early in the design cycle to enable easier modifications. This approach yields better results than waiting until late in the design cycle during timing closure.