Fix performance regression due to clFinish()

- In ClGemmLowpMatrixMultiplyCore::prepare we always called clFinish() also when the workload was already prepared Resolves COMPMID-4707 Change-Id: Icdcee528590e2c5efb75325a80c2a45ec84993d1 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6082 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
author: Gian Marco Iodice <gianmarco.iodice@arm.com> 2021-08-11 14:06:28 +0100
committer: Giorgio Arena <giorgio.arena@arm.com> 2021-08-11 15:08:28 +0000
commit: d761a3e3c153083cd3843fe686f27e3438c87d1c (patch)
tree: 27046940a8ff7faa33374982f695d216212840c1
parent: 288d3cb4beb7bbfdb2f8ce2811a07bf285a00d21 (diff)
download: ComputeLibrary-d761a3e3c153083cd3843fe686f27e3438c87d1c.tar.gz
1 files changed, 1 insertions, 1 deletions
diff --git a/src/runtime/gpu/cl/operators/ClGemmLowpMatrixMultiplyCore.cpp b/src/runtime/gpu/cl/operators/ClGemmLowpMatrixMultiplyCore.cpp
index 64c8743f13..0c72912642 100644
--- a/src/runtime/gpu/cl/operators/ClGemmLowpMatrixMultiplyCore.cpp
+++ b/src/runtime/gpu/cl/operators/ClGemmLowpMatrixMultiplyCore.cpp
@@ -773,9 +773,9 @@ void ClGemmLowpMatrixMultiplyCore::prepare(ITensorPack &tensors)
                 shifts_tensor->unmap(CLScheduler::get().queue());
             }
         }
+        CLScheduler::get().queue().finish();
         _is_prepared = true;
     }
-    CLScheduler::get().queue().finish();
 }
 
 experimental::MemoryRequirements ClGemmLowpMatrixMultiplyCore::workspace() const
author	Gian Marco Iodice <gianmarco.iodice@arm.com>	2021-08-11 14:06:28 +0100
committer	Giorgio Arena <giorgio.arena@arm.com>	2021-08-11 15:08:28 +0000
commit	d761a3e3c153083cd3843fe686f27e3438c87d1c (patch)
tree	27046940a8ff7faa33374982f695d216212840c1
parent	288d3cb4beb7bbfdb2f8ce2811a07bf285a00d21 (diff)
download	ComputeLibrary-d761a3e3c153083cd3843fe686f27e3438c87d1c.tar.gz