The following figure shows an example with a single out-of-order command queue. The scheduler can dispatch commands from the queue in any order. You must manually define event dependencies and synchronizations as required.
The following is code extracted from host.cpp of the concurrent_kernel_execution_c example that sets up a single out-of-order command queue and enqueues commands as needed:
OCL_CHECK( err, cl::CommandQueue ooo_queue(context, device, CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &err)); ... printf("[OOO Queue]: Enqueueing scale kernel\n"); OCL_CHECK( err, err = ooo_queue.enqueueTask( kernel_mscale,nullptr, &ooo_events)); set_callback(ooo_events, "scale"); ... // This is an out of order queue, events can be executed in any order. Since // this call depends on the results of the previous call we must pass the // event object from the previous call to this kernel's event wait list. printf("[OOO Queue]: Enqueueing addition kernel (Depends on scale)\n"); kernel_wait_events.resize(0); kernel_wait_events.push_back(ooo_events); OCL_CHECK(err, err = ooo_queue.enqueueTask( kernel_madd, &kernel_wait_events, // Event from previous call &ooo_events)); set_callback(ooo_events, "addition"); // This call does not depend on previous calls so we are passing nullptr // into the event wait list. The runtime should schedule this kernel in // parallel to the previous calls. printf("[OOO Queue]: Enqueueing matrix multiplication kernel\n"); OCL_CHECK(err, err = ooo_queue.enqueueTask( kernel_mmult, nullptr, &ooo_events)); set_callback(ooo_events, "matrix multiplication");
The Timeline Trace view shows that the compute unit
mmult_1 is running in parallel with the compute units
madd_1, using both multiple in-order queues and single out-of-order queue