From ef637398a8c2060e15de438020c53331da8bd6dd Mon Sep 17 00:00:00 2001 From: Gunes Bayir Date: Mon, 12 Feb 2024 21:32:51 +0000 Subject: Integrate new pretranspose_b_array with extra fused transpose of B This patch fuses the transposition taking place in Acl with the transformations done in arm_gemm (called pretranspose_b_array) if the underlying kernel and transform supports it. This should improve start-up time (as it's for constant Rhs matrices) and memory footprint. The transformations in arm_gemm are kernel specific. The Rhs matrix is transformed into certain layouts to improve the performance. Resolves: COMPMID-6595 Change-Id: Id2932dd966e59f903c279417bebcea83d9a42464 Signed-off-by: Gunes Bayir Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11144 Tested-by: Arm Jenkins Reviewed-by: Viet-Hoa Do Comments-Addressed: Arm Jenkins Benchmark: Arm Jenkins --- docs/user_guide/release_version_and_change_log.dox | 3 +++ 1 file changed, 3 insertions(+) (limited to 'docs/user_guide') diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox index 676f1ca032..b788957dda 100644 --- a/docs/user_guide/release_version_and_change_log.dox +++ b/docs/user_guide/release_version_and_change_log.dox @@ -41,6 +41,9 @@ If there is more than one release in a month then an extra sequential number is @section S2_2_changelog Changelog +v24.04 Public major release + - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm + v24.02 Public major release - Replace template writer with compute kernel writer in dynamic fusion. - Performance optimizations: -- cgit v1.2.1