After the equation above has been calculated, it is possible to estimate the initial HW/SW performance ratio:
Speed-up = THW/TSW = Fmax * Running Time /Vops
Without any parallelization, the initial speed-up will most likely be less than 1.
Next, calculate how much parallelism is needed to meet the performance goal:
Parallelism Needed = TGoal / THW = TGoal * Vops / (Fmax * max(VINPUT, VOUTPUT))
This parallelism can be implemented in various ways: by widening the datapath, by using multiple engines, and by using multiple kernel instances. The developer should then determine the best combination given their needs and the characteristics of their application.