diff options
author | Gunes Bayir <gunes.bayir@arm.com> | 2024-03-04 14:55:24 +0000 |
---|---|---|
committer | Michael Kozlov <michael.kozlov@arm.com> | 2024-03-05 15:18:31 +0000 |
commit | 5f667d6902cc7cbfb09b71fe130ccf83a5f7d060 (patch) | |
tree | 9e8cb6b8c4613b1e96c86cd0af1799d1bb7575d4 /src/core/NEON/kernels/arm_conv/depthwise/kernels/a64_s8q_nhwc_3x3_s2_output2x2_mla_depthfirst.hpp | |
parent | 0c49024339843b5ead43a52c037efe8bb7e21e59 (diff) | |
download | ComputeLibrary-5f667d6902cc7cbfb09b71fe130ccf83a5f7d060.tar.gz |
Fix performance regression in fixed-format kernels
Fix the performance regression in CpuGemmConv2d caused by importing memory at every run for fixed-format kernels. This has been done by adding an bypass_import parameter to the aux. tensor handler class (CpuAuxTensorHandler) and using it in CpuGemmConv2d so that the memory import happens if and only when the associated tensor is used in the gemm pack.
Also, improve the documentation of CpuAuxTensorHandler.
Resolves: ARMCL-1126
Co-authored by: SiCong Li <sicong.li@arm.com>
Change-Id: Idb26bdb2d19419074a6e7f2497a1741ae200603f
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11240
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Diffstat (limited to 'src/core/NEON/kernels/arm_conv/depthwise/kernels/a64_s8q_nhwc_3x3_s2_output2x2_mla_depthfirst.hpp')
0 files changed, 0 insertions, 0 deletions