AI Engine Compiler Options - 2023.2 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2023-12-04
Version
2023.2 English
Table 1. AI Engine Options
Option Name Description
--constraints=<string> Constraints (location, bounding box, etc.) can be specified using a JSON file. This option lets you specify one or more constraint files.
--heapsize=<int> Heap size (in bytes) used by each AI Engine

The stack, heap, and sync buffer (32 bytes, includes the graph run iteration number information) are allocated up to 32768 bytes of data memory. The default heap size is set to 1024 bytes. Before changing the heap size to a different value, ensure that the sum of the stack, heap, and sync buffer sizes does not exceed 32768 bytes.

Used for allocating any remaining file-scoped data that is not explicitly connected in the user graph.

--stacksize=<int> Stack size (in bytes) used by each AI Engine

The stack, heap, and sync buffer (32 bytes) are allocated up to 32768 bytes of data memory. The default stack size is set to 1024 bytes. Before changing the stack size to a different value, ensure that the sum of the stack, heap, and sync buffer sizes does not exceed 32768 bytes.

Used as a standard compiler calling convention including stack-allocated local variables and register spilling.

--pl-freq=<value> Specifies the interface frequency (in MHz) for all PLIOs. The default frequency is a quarter of the AI Engine frequency and the maximum supported frequency is half of the AI Engine frequency. The PL frequency specific to each interface can be provided in the graph.
--pl-register-threshold=<value> Specifies the frequency (in MHz) threshold for registered AI Engine-PL crossings (the Boundary Logic Interface register (BLI)). BLI flip-flops exist in hardware between the AI Engine-programmable logic (PL) interface, which you can use to improve timing. For timing critical designs, enabling the BLI registers helps to achieve the highest clock frequency. Use this option to control the inference of BLI registers across the AI Engine-PL channels. BLI registers are inserted in the AI Engine-PL interfaces, if the PL frequency is greater than the PL register threshold value.

The default frequency for the threshold is 1/8th of the AI Engine frequency dependent on the specific device speed grade.

Note: PL Register Threshold values above 1/4th of the AI Engine array frequency are ignored, and a the value of 1/4th of the AI Engine array frequency is used instead.
Table 2. CDO Options
Option Name Description
--broadcast-enable-core Enables all AI Engines associated with a graph using broadcast. This option reserves one broadcast channel in the array for core enabling purpose. The default is true.
Table 3. Compiler Debugging Options
Option Name Description
--adf-api-log-level=<value> ADF API log-level. Available values are as follows:
  • 0: errors
  • 1: level-0 + warnings
  • 2: level-1 + info messages
  • 3: level-2 + debug messages

The default is 2.

--kernel-linting Performs consistency checking between graphs and kernels. The default is false.
--quiet Suppresses the output of the AI Engine compiler.
--verbose Verbose output of the AI Engine compiler emits compiler messages at various stages of compilation. These debug and tracing logs provide useful messages regarding the compilation process.
Table 4. Design Rule Check Options
Option Name Description
--drc.disable=<string> Disables the Design Rule Check for the specified ID. A disabled check is not executed.
--drc.enable=<string> Enables the Design Rule Check for the specified ID.
--drc.severity=<string> Changes the severity of a Design Rule Check: format <ID>:<severity>[:context].
--drc.waive=<string> Waives the Design Rule Check for the specified ID. A waived check is still performed, but marked as waived.
Table 5. Execution Target Options
Option Name Description
--target=<hw|x86sim> The AI Engine compiler supports several build targets. The default is hw.
  • The hw target produces a libadf.a for use in the hardware device on a target platform and hardware emulation.
  • The x86sim target compiles the code for use in the x86 simulator as described in x86 Functional Simulator.
Table 6. File Options
Option Name Description
--include=<string> This option can be used to include additional directories in the include path for the compiler front-end processing.

Specify one or more include directories.

--output=<string> Specifies an output.json file that is produced by the front end for an input data flow graph file. The output file is passed to the back-end for mapping and code generation of the AI Engine device. This is ignored for other types of input.
--output-archive=<string> Specifies output archive name which will contain compiled AI Engine artifacts. The default is libadf.a.
--platform=<string>

Specifies the path to a Vitis platform file. This file is the starting point for you to develop Vitis applications. These applications can leverage the resources and capabilities offered by the platform. The platform defines both hardware and software components that are available for use in your application. The platform can be specified as a Xilinx Platform File (XPFM) or a Xilinx Shell Archive (XSA).

--part=<string> Specifies the part family or part value.


For example, aiecompiler --include ./aie --part xcvc1902-vsvd1760-2MP-e-S graph.cpp

To perform AI Engine compilation, aiecompiler requires you to specify either the --platform option or the --part option. The --platform and --part options cannot be specified at the same time.

--workdir=<string>

By default, the compiler writes all outputs to a sub-directory of the current directory, called Work. Use this option to specify a different output directory.

Table 7. Generic Options
Option Name Description
--help Lists the available AI Engine compiler options, sorted in the groups listed here.
--help-list Displays an alphabetic list of AI Engine compiler options.
--version Displays the version of the AI Engine compiler.
Table 8. Miscellaneous Options
Option Name Description
--disable-multirate This option disables multirate in ADF graphs. The default is false.
--evaluate-fifo-depth This option, available only for AI Engine, analyzes re-convergent data paths. Data might be sent on multiple paths and sometimes they can re-converge which can result in a deadlock. Such deadlocks can be resolved by adding FIFOs to the appropriate data paths.

The steps for evaluating and resolving deadlocks as a result of re-convergent data paths is as follows:

  1. Compile the design with this option.
  2. Run aiesimulator on the design.
  3. Open Vitis Unified IDE with the simulation run_summary. Note the "Estimated FIFO" column and apply the recommended number of FIFOs using the fifo_depth constraint on specific nets.
Note: This feature is not available for AI Engine-ML designs.

For more information, see Evaluate FIFO Depth To Break Deadlocks.

--no-init This option disables initialization of window buffers in AI Engine data memory. This option enables faster loading of the binary images into the SystemC-RTL co-simulation framework. The default is false.
Tip: This does not affect the statically initialized lookup tables.
--nodot-graph By default, the AI Engine compiler produces .dot and .png files to visualize the user-specified graph and its partitioning onto the AI Engines. This option can be used to eliminate the dot graph output. The default is false.
--lock-fence-mode=<int> Controls scheduling around the acquire and release of I/O buffers for AIE-ML designs as follows:
  • 0: Uses conservative scheduling around acquire and release of I/O buffers.
  • 1 (default): Uses aggressive scheduling around acquire and release of I/O buffers, which may reduce cycle count.
Table 9. Module Specific Options
Option Name Description
--Xchess=<string> Can be used to pass kernel specific options to the CHESS compiler that is used to compile code for each AI Engine.

The option string is specified as <kernel-function>:<optionid>=<value>. This option string is included during compilation of generated source files on the AI Engine where the specified kernel function is mapped.

--Xelfgen=<string> Can be used to pass additional command-line options to the ELF generation phase of the compiler, which is currently run as a make command to build all AI Engine ELF files.

For example, to limit the number of parallel compilations to four, you write -Xelfgen="-j4".

Note: If during compilation you see errors with bad_alloc in the log, or if the Vitis IDE crashes, this could be due to insufficient memory on your workstation. A possible workaround (other than increasing the available memory on your machine) is to limit the parallelism used by the compiler during code generation phase. This can be specified in the GUI as the compiler CodeGen option -j1 or -j2, or on the command line as -Xelfgen=-j1 or -Xelfgen=-j2.
--Xmapper=<string> Can be used to pass additional command-line options to the mapper phase of the compiler. For example:
--Xmapper=DisableFloorplanning

These are options to try when the design is either failing to converge in the mapping or routing phase, or when you are trying to achieve better performance via reduction in memory bank conflict.

See the Mapper and Router Options for a list and description of options.

--Xpreproc=<string> Passes general option to the PREPROCESSOR phase for all source code compilations (AIE/PS/PL/x86sim). For example:
--Xpreproc=-D<var>=<value>
--Xpslinker=<string> Passes general option to the PS LINKER phase. For example:
--Xpslinker=-L<libpath> -l<libname>
--Xrouter=<string> Passes general option to the ROUTER phase. For example:
-Xrouter=dmaFIFOsInFreeBankOnly
--Xx86sim=<string> Passes x86sim specific option to the compiler. For example:
-Xx86sim=clangStaticAnalyzer

Enables clang static analyzer for kernel source code.

--fast-floats Enables fast implementation for linear floating point scalar operations like add, sub, mul, and compare.
--fast-nonlinearfloats Enables fast implementation for non-linear floating point scalar operations like sine/cosine, sqrt, and inv.
--fastmath Enables fast implementations of float2fix, fplt and fpge.
--float-accuracy arg Selects the floating-point operations accuracy in AI Engine ML:
  • safe: Accuracy is slightly better than FP32.
  • fast: Improved performance with similar accuracy to FP32.
  • low: Best performances with better accuracy than FP16 and bfloat16.
Note: Only AI Engine kernels that have been modified are recompiled in subsequent compilations of the AI Engine graph. Any un-modified kernels will not be recompiled.
Table 10. Event Trace Options
Option Name Description
--event-trace=<value>

where <value> is one of the following:

  • functions
  • functions_partial_stalls
  • functions_all_stalls
  • runtime
Event trace configuration value. Where the specified <value> indicates the following:
  • Function transition view without stalls.
  • Function transition view with stream/lock/cascade stalls.
  • Function transition view with all stalls (stream/lock/cascade/memory).
  • Run-time event tracing configuration of AI Engine memory and interface tiles.
--event-trace-port=<value>
  • plio
  • gmio
Sets the AI Engine event tracing port. The default value is gmio; AMD recommends that you use gmio as the event-trace-port configuration. See Event Trace Build Flow for more information.
  • Sets the AI Engine event tracing port to plio
  • Sets the AI Engine event tracing port to gmio
--num-trace-streams=<int> Number of trace streams. The default is 16.
--trace-plio-width=<int> PLIO width for trace streams. The default is 64. Allowed values are 32 and 64.
--graph-iterator-event Generates the user event0() whenever the graph iterator is incremented. This enables the capability to delay the start of the hardware event trace based on the graph iteration.
Table 11. Optimization Options
Option Name Description
--xlopt=<int> Enables a combination of kernel optimizations based on the opt level. Allowable values are 0 to 2; the default is 1.
  • xlopt=1
    • Automatic computation of heap size: Enables ease of use using kernel analysis to automatically compute the heap requirements for each AI Engine. Therefore, you do not need to specify the heap size.
    • Guidance: Guidance is provided to highlight unaligned variables, global arrays that can potentially be mapper allocated, improper usage of restrict, and potential read before write conflicts.
    • Pragma insertion: Automatically infer and insert pragmas in kernel code.
  • xlopt=2
    • Automatic inline: Automatically inlining functions if it is practical and possible to do so, even if the functions are not declared as __inline or inline.
    • Loop peeling for unrolled loops: Make loop iteration count a multiple of the unrolling factor via peeling. Split a loop into multiple loops based on its iteration count and profitability heuristics, and add flattening pragma on the split loops.
Note: Compiler optimization (xlopt > 0) reduces debug visibility.
--Xxloptstr=<string> Option string to enable/disable optimizations in xlopt level 1 and 2.
  • -annotate-pragma=false: turns off automatic insertion off loop pragmas
  • -xlinline-threshold=T: set the automatic inlining threshold to T (default T = 5000)
  • -annotate-pragma: automatic insertion of loop unrolling, pipelining, and flattening pragma (default = true)
--lock-fence-mode
Note: This option only applies to AIE-ML devices.

Enables more aggressive scheduling around acquire and release of I/O buffers (default: 0).

  • --lock-fence-mode=0: disables aggressive scheduling
  • --lock-fence-mode=1: uses aggressive scheduling except for multi-layer designs
--runtime-opt Enabling this option reduces the overall compile time by identifying cores that are functionally identical and compiling them just once.
  • --runtime-opt=false: No optimization. This is the default.
  • --runtime-opt=true: Enables optimization.
Note: Two reserved words, aie and adf, are not valid namespace identifiers in graph programming.
Note: Function names defined in the AI Engine graph and kernel code should not be identical to function names from standard c++ library. aiecompiler will issue an error message when such functions are used, since they conflict with predefined function names.