Fix performance regression in FP16 Deconvolution

The previous heuristic for selecting the Deconvolution method with FP32 input data introduced a performance regression for FP16. A simple fix ensures the previous heuristic applies to FP32 types only. Resolves: COMPMID-6027 Change-Id: I77ca6c9c72534057a3967db58924a972b0efb09f Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9616 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
author: Jakub Sujak <jakub.sujak@arm.com> 2023-04-27 09:24:05 +0100
committer: Jakub Sujak <jakub.sujak@arm.com> 2023-05-12 09:00:33 +0000
commit: 56fabbae2309856f74151c0bc909d15d84951a2c (patch)
tree: 47da25ab4c124e146d3fb83fa923921cba74b06a /src/runtime/CL
parent: 7997603de02e3d9d901b80988c044d1184b2c069 (diff)
download: ComputeLibrary-56fabbae2309856f74151c0bc909d15d84951a2c.tar.gz
1 files changed, 4 insertions, 1 deletions
diff --git a/src/runtime/CL/functions/CLDeconvolutionLayer.cpp b/src/runtime/CL/functions/CLDeconvolutionLayer.cpp
index 5c25cbafaf..4421a18f2a 100644
--- a/src/runtime/CL/functions/CLDeconvolutionLayer.cpp
+++ b/src/runtime/CL/functions/CLDeconvolutionLayer.cpp
@@ -23,6 +23,7 @@
  */
 #include "arm_compute/runtime/CL/functions/CLDeconvolutionLayer.h"
 
+#include "arm_compute/core/Types.h"
 #include "arm_compute/core/Utils.h"
 #include "arm_compute/core/Validate.h"
 #include "arm_compute/core/utils/misc/ShapeCalculator.h"
@@ -155,7 +156,9 @@ DeconvolutionMethod CLDeconvolutionLayer::get_deconvolution_method(const ITensor
 
     if(weights->dimension(idx_w) != deconv_info.stride().first || weights->dimension(idx_h) != deconv_info.stride().second)
     {
-        if(input->data_layout() == DataLayout::NHWC && ofm <= 16)
+        // We observe better performance for FP32 types only when ofm <= 16.
+        // A better heuristic is required for selecting the method for FP16 data types.
+        if(input->data_layout() == DataLayout::NHWC && !((input->data_type() == DataType::F32) && (ofm > 16)))
         {
             return DeconvolutionMethod::DIRECT;
         }
author	Jakub Sujak <jakub.sujak@arm.com>	2023-04-27 09:24:05 +0100
committer	Jakub Sujak <jakub.sujak@arm.com>	2023-05-12 09:00:33 +0000
commit	56fabbae2309856f74151c0bc909d15d84951a2c (patch)
tree	47da25ab4c124e146d3fb83fa923921cba74b06a /src/runtime/CL
parent	7997603de02e3d9d901b80988c044d1184b2c069 (diff)
download	ComputeLibrary-56fabbae2309856f74151c0bc909d15d84951a2c.tar.gz