Use Device Resources More Efficiently - 2021.2 English

Vivado Design Suite User Guide: Power Analysis and Optimization (UG907)

Document ID
UG907
Release Date
2021-10-22
Version
2021.2 English
Block RAM
  • The amount of power block RAM consumes is directly proportional to the amount of time it is enabled. To save power, the block RAM enable can be driven Low on clock cycles when the block RAM is not used in the design. Block RAM Enable Rate, along with Clock rate, is an important parameter that must be considered for power optimization.
  • Use the NO_CHANGE mode in the TDP mode if the output latches remain unchanged during a write operation. This mode is the most power efficient. This mode is not available in the SDP mode because it is identical in behavior to WRITE_FIRST mode.
I/O
I/O interfaces have to drive long distances with potentially more parasitic effects, hence they typically represent a large portion of the device power requirements.
VCCAUX
Use the lowest VCCAUX possible. This minimizes both the static and dynamic power for this voltage supply.
Inputs
Limit usage of internally referenced input standards.
IODELAY
Set the HIGH_PERFORMANCE_MODE property on the IDELAY2 to FALSE. When FALSE, this property increases the output jitter, but consumes less power.
IBUF_LOW_PWR
Set the IBUF_LOW_PWR property to TRUE on bidirectional and input I/Os. Make sure the design performance allows for this setting.
I/O Configuration
Review the I/O standard, drive strength, and on-chip termination settings in the context of your performance needs and evaluate if you can use lower drive strength using tristatable DCI I/O standards (T_DCI), get by without terminations, or use external terminations.
Outputs
  • Use the lowest slew/drive/voltage level supported by the receiving chip(s).
  • No termination or series termination are preferred over parallel terminations. Signal integrity simulation tools can help with this determination.
  • Consider whether using on-chip or off-chip termination is the best option given your device thermal budget, system cost, and board real estate requirements.
  • Evaluate using lower voltage swing differential standards.
  • Evaluate if your application allows you to use transceivers instead of large parallel busses.
  • Evaluate the requirements of I/O features such as IBUF, IO DELAY, and others, and disable when performance allows.
Transceivers
  • The GTX/GTH/GTP transceiver supports a range of power-down modes that may save power if applicable.
  • There are two types of adaptive filtering available to the GTX/GTH receiver depending on system level trade-offs between power and performance. Optimized for power with lower channel loss, the GTX/GTH/GTP receiver has a power-efficient adaptive mode named the low-power mode (LPM).
  • Each GTX/GTH/GTP transceiver provides support for generating the out-of-band (OOB) sequences described in the Serial ATA (SATA), Serial Attach SCSI (SAS) specification, and beaconing described in the PCI™ Express specification. If OOB sequence is not used, this could further save power.
  • Pack the maximum number of transceivers into a single tile to minimize duplicating supporting circuits.
XADC
  • The XADC can be powered down by writing to its Configuration register #2 (Address 0x42) from the DRP port during run time. Bits DI4 and DI5 of this register control the power-down for each channel. To statically emulate power-down behavior in Vivado® , the configuration registers can be set by entering this command in the Vivado Tcl console:
    set_property INIT_42 {16'h0430} [get_cells <inst>]

    where <inst> is the XADC instance. The above command powers down both channels of the XADC.

Logic

You can optimize the design description using these methods:

  • Minimize asynchronous control signals which prevent logic optimization and use more routing resources.
  • Minimize the number of control sets. A control set consists of the unique grouping of a clock, clock enable, set, reset, and, in the case of LUT RAM, write enable signals. Control set information is important because count limits or sharing of signals within a slice may occur. This varies with the device architecture, and when the limit is reached can prevent proximity packing of related logic, which would increase routing resources.
  • Add pipeline levels to minimize the size of combinatorial logic cones. This minimizes the propagation of glitches between registers until signals reach their final state at each clock cycle.
  • Use resource time sharing. These techniques minimize device resource usage by time multiplexing different functions to the same hardware resources. This allows you to use a smaller device or can reduce placement and routing congestion, which will lower both static and dynamic core power.
  • Processes which are slow and similar can be performed on the same resources instead of separate resources. This requires careful thinking for how to buffer, multiplex, initialize, and control the data to be processed. Typical applications for such optimization are similar parallel processes, such as processing multiple input sensors. Instead of having as many processing units as inputs, you could use a single processing unit and make it run faster, so it processes input channels one after the other while ensuring the same response time for each output. A Xilinx® Power Estimator What If? estimation can help you decide whether the power savings are worth the engineering effort.
  • Use the DSP and block RAM optional registers. For example, in DSP blocks the multiplier or MREG registers, when enabled, are the most power efficient implementation as they minimize the propagation of internal glitches between clock cycles.