For the other input matrix B (or A), its data needs to be also double buffered and can be reused. A special buffer object is designed to perform this operation in GEMM kernel.