After the equation above has been calculated, it is possible to estimate the initial HW/SW performance ratio:
Without any parallelization, the initial speed-up will most likely be less than 1.
Next, calculate how much parallelism is needed to meet the performance goal:
This parallelism can be implemented in various ways: by widening the datapath, by using multiple engines, and by using multiple kernel instances. The developer should then determine the best combination given their needs and the characteristics of their application.