ComputeLibrary.git -

diff options

author	Gunes Bayir <gunes.bayir@arm.com>	2024-03-04 14:55:24 +0000
committer	Michael Kozlov <michael.kozlov@arm.com>	2024-03-05 15:18:31 +0000
commit	5f667d6902cc7cbfb09b71fe130ccf83a5f7d060 (patch)
tree	9e8cb6b8c4613b1e96c86cd0af1799d1bb7575d4 /src/core/NEON/kernels/arm_conv/depthwise/kernels/a64_s8q_nhwc_5x5_s1_output2x2_mla_depthfirst.hpp
parent	0c49024339843b5ead43a52c037efe8bb7e21e59 (diff)
download	ComputeLibrary-5f667d6902cc7cbfb09b71fe130ccf83a5f7d060.tar.gz

Fix performance regression in fixed-format kernels

Fix the performance regression in CpuGemmConv2d caused by importing memory at every run for fixed-format kernels. This has been done by adding an bypass_import parameter to the aux. tensor handler class (CpuAuxTensorHandler) and using it in CpuGemmConv2d so that the memory import happens if and only when the associated tensor is used in the gemm pack. Also, improve the documentation of CpuAuxTensorHandler. Resolves: ARMCL-1126 Co-authored by: SiCong Li <sicong.li@arm.com> Change-Id: Idb26bdb2d19419074a6e7f2497a1741ae200603f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11240 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>

Diffstat (limited to 'src/core/NEON/kernels/arm_conv/depthwise/kernels/a64_s8q_nhwc_5x5_s1_output2x2_mla_depthfirst.hpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: