The cloud accelerator has multiple independent Compute Units (CU) that can be programmed to each work on a different AI model, or to work on the same AI model for maximum throughput.
The cloud runtime introduces a new AI resource manager, to simplify scaling applications over multiple FPGA resources. The application no longer needs to designate a specific FPGA card to be used. Applications can make requests for a single Compute Unit or a single FPGA, and the AI resource manager returns a free resource compatible with the user’s request. The AI resource manager works with Docker containers, as well as multiple users and multiple tenants on the same host.