Reducing Window Buffer Sizes for Very High Memory Density Designs

Reducing Window Buffer Sizes for Very High Memory Density Designs - 2023.2 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID

UG1076

Release Date

2023-12-04

Version

2023.2 English

One of the main considerations when determining the window sizes for a design is that the number of cycles required for data loading is balanced with the number of compute cycles required by the kernel. This helps to pipeline the ping and pong buffer data loading with the kernel compute. For very high memory density designs, it makes sense to have smaller window sizes which can still balance the kernel compute because having larger window sizes might lead to mapper failure.

The following table shows the number of cycles required for the matrix multiplication of two matrices with 16-bit data. Example 1 and Example 2 have different matrix sizes, but both have their compute and data loading balanced. Note that only the larger of the A or B matrix size determines the data loading time whereas the time of kernel compute is determined by both sizes. This shows that Example 1 has smaller window sizes than Example 2, but the compute and data loading are balanced and can be pipelined.

Table 1. Matrix Multiplication Examples
	Matrix A Size	Matrix B Size	# of Multiplication Operations (MultOps)	#Cycles for Compute 32 ops/ cycle	#Cycles for Data Loading 32 bits/ cycle
Example 1	16x64	64x16	16384	512 (16384/32)	512 (64x16x16/32)
Example 2	16x64	64x32	32768	1024 (32768/32)	1024 (64x32x16/32)