This tutorial illustrated three specific areas of host code optimization:
Pipelined Kernel Execution using an Out-of-Order Event Queue
Kernel and Host Code Synchronization
OpenCL API Buffer Size
Consider these areas when trying to create an efficient acceleration implementation. The tutorial showed how these performance bottlenecks can be analyzed and shows one way of how they can be improved.
In general, there are many ways to implement your host code and improve performance. This applies to improving host to accelerator performance and other areas such as buffer management. This tutorial did not cover all aspects related to host code optimization.