Graph Execution Model - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English
  1. Use the following command to run an AI Engine simulator:

    make aiesim
    
  2. Open the running result by accessing the following directory:

    vitis_analyzer aiesimulator_output/default.aierun_summary
    

    Note: The --dump-vcd option is used by aiesimulator to dump the VCD file for trace.

  3. Click Trace view in Vitis Analyzer. Zoom in to view the first few runs of the design:

    Trace View

    This figure lists the following events and their dependencies:

    1: Tile 24_0 DMA s2mm channel 0 (s2mm.Ch0.BD0.lock0) starts. It acquires the lock of ping of input buffer (buf0) to aie_dest1 and transfers data from the PL to buf0. Refer to Graph View for the position of the buffer in the graph.

    2: After DMA s2mm channel 0 BD 0 completes, DMA s2mm channel 0 BD 1 (s2mm.CH0.BD1.lock1) starts. It acquires the lock of pong of input buffer (buf0d) and transfers data from the PL to buf0d.

    3a: The aie_dest1 kernel (in tile 25_0) acquires the lock of buf0 (shown as read lock allocated).

    3b: aie_dest1 acquires the lock of ping of output buffer (buf1) as well.

    4a: After aie_dest1 acquires the locks of its input buffer (buf0) and output buffer (buf1), it starts. If any lock cannot be acquired, it will run into lock stall.

    4b: After tile 24_0 DMA s2mm channel 0 BD 1 (s2mm.CH0.BD1.lock1) completes, it switches back to DMA s2mm channel 0 BD 0 (s2mm.CH0.BD1.lock0). At first, buf0 is still read by aie_dest1 (read lock allocated), so it sticks at DMA lock req in red. After the read lock of the buffer is released, it acquires the lock and starts data transfer from PL.

    5: After aie_dest1 completes, it releases the output buffer (buf1). The kernel aie_dest2 acquires the lock of buf1 (read lock allocated).

    6a: After the lock of buf1 is acquired, aie_dest2 starts.

    6b:. aie_dest1 acquires the lock of buf0d (read lock allocated).

    6c: aie_dest1 acquires the lock of buf1d (write lock allocated).

    7: After aie_dest1 acquires the locks of its input buffer (buf0d) and output buffer (buf1d), it starts.

    8: After aie_dest1 completes, it releases the output buffer (buf1d). Kernel aie_dest2 acquires the lock of buf1d (read lock allocated).

    9: After the lock of buf1d is acquired, aie_dest2 starts.

    Note: The stream interface does not need to acquire lock; it has inherent an backward and forward pressure for data synchronization. Every lock acquires and releases event has some cycles of overhead.