From cfca87b91def4f455630f2094447dc0500b6256c Mon Sep 17 00:00:00 2001
From: Gunes Bayir <gunes.bayir@arm.com>
Date: Tue, 9 Apr 2024 23:13:04 +0100
Subject: Add SME2 implementation of softmax for FP16

In addition to the softmax kernel, this patch fixes minor issues in the fp32 implementation.

Resolves: COMPMID-6920

Change-Id: Ibbd9f0af5f2a93fba0e92d72ba437279c34149d3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11402
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
---
 docs/user_guide/release_version_and_change_log.dox | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'docs')

diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox
index b8910c9237..9da4956c43 100644
--- a/docs/user_guide/release_version_and_change_log.dox
+++ b/docs/user_guide/release_version_and_change_log.dox
@@ -45,7 +45,7 @@ v24.04 Public major release
  - Add Bfloat16 data type support for @ref NEMatMul.
  - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
  - Optimize @ref NEConvolutionLayer for input tensor size > 1e7 bytes and weight tensor height > 7
- - Add support for SoftMax in SME2 for FP32.
+ - Add support for SoftMax in SME2 for FP32 and FP16.
  - Performance optimizations:
    - Optimize @ref NESoftmaxLayer for axis != 0 by natively supporting higher axes up to axis 3.
  - Add support for in place accumulation to CPU GEMM kernels.
-- 
cgit v1.2.1