Conclusion

Using a MATLAB Simulink development environment can greatly simplify the effort required to design, debug, and analyze a DSP based Versal ACAP design. In this application note, a PID controller was simulated and implemented in PL using the VMC HLS Toolbox and a custom C++ based Math sequencer in PL, targeting the Versal DSP58 single precision floating point hard macros. The AI Engine implementation demonstrated how the single instruction multiple data (SIMD) vector processor can be harnessed to perform up to eight PIDs concurrently. The resource comparison for a one PID loop VMC HLS Toolbox, Math Sequencer implementation, and four parallel PID loops running concurrently on one AI Engine is shown in the following table.

Table 1. Resources, Latency, Clock Frequency and Sample Rate Comparison for a Single Precision Floating Point PID Implementation
	DSP	LUTs	FFs	AI Engine	Latency (Clocks)	Clock (MHz)	Sample Rate (MSPS)
VMC native blocks (single channel)	5	565	505	0	69	472	6.8
Math Sequencer (single channel)	4	513	962	0	83	684.3	7.7
AI Engine (4 channel)	0	0	0	1		1 GHz	4

For a single copy of the VMC HLS Toolbox PID, five DSP, 565 LUT, 505 FF would be consumed, which is an almost negligible number of gates in a Versal device. For one or two PIDs with available gates, using dedicated gates is the best approach. To run a more complicated algorithm such as field oriented control (FOC) application, more dedicated hardware with more math operators is needed, which will increase costs and resources.

For eight PID loops using the VMC HLS toolbox approach, we need 8 × (565 LUT + 505 FF + 5 DSP58) = (4520 LUT + 4040 FF + 40 DSP58), which will all independently run at 6.84 MSPS. But 6.84 MSPS is significantly faster than what a brushless DC motor with ~40 KHz loop bandwidth might require. In other words, with a 40 KHz PID loop bandwidth, you process one sample every 40 KHz or 40 KSPS. If the VMC HLS toolbox PID is 6.84 MSPS /40 KSPS = 171 times faster than required for a brushless DC motor controller, and you want to drive down cost, you should resource share the multiplication, adder, and subtract operators over time. The Math Sequencer is explicitly designed to be a low cost way to sequentially process arithmetic operations over time.

The Math Sequencer runs a single PID loop at 7.7 MSPS. The register set of the Math Sequencer can be enhanced and sequentially process eight PIDs using a single math sequencer, but then the achievable sample rate per channel will drop linearly to ~12.5% of a single channel throughput. 7.7 MSPS / 8 = 0.96 MSPS which is still ~24x faster than needed for a single 40 KSPS brushless DC motor control loop, and eight copies of dedicated hardware are not needed. A more complex algorithm like a FOC loop using a Math Sequencer can be processed, and the programming would be complex, but there are advantages to a Math Sequencer. Not only are they inexpensive, they only need to be programmed once, no micro controller licensing is required, and no support tools are required to buy and learn. Any arithmetic algorithm can be customized, and the controller field can be updated without having to change the design, much like a processor.

When considering the AI Engines, which have both a 32-bit RISC processor and a vector processor, the vector processor SIMD performs using the same arithmetic operator across all lanes. There are eight parallel lanes for floating point operations. Each individual lane is used to perform the same: four multiplications, three subtracts, and four adds for eight independent PIDs using one AI Engine. If the average brushless DC motor control loop runs at 40 KSPS, the PID loops are running roughly 100x faster than required. At 40 KSPS, an AI Engine would be idle ~99% of the time because it simply does not have enough work to do. This gives you the ability to consider more complex algorithms like field oriented control (FOC), which has several desirable advantages for up to eight motors running simultaneously using a single AI Engine.

Throughout the process, it is worth noting:

VMC simplifies test bench development, verification, validation, and debug by utilizing the many inherent Simulink capabilities and toolboxes.
Using functional simulations to debug and develop a design is significantly faster than using cycle approximate bit accurate simulations.
Both the test bench and the user design can be created, evaluated, and exported for use with Vitis HLS and Vitis.