diff options
author | Gunes Bayir <gunes.bayir@arm.com> | 2023-11-07 05:43:07 +0000 |
---|---|---|
committer | Gunes Bayir <gunes.bayir@arm.com> | 2023-12-05 13:52:17 +0000 |
commit | fadc9b1e0bba90d6a91beb65466b2a0895b3a5e4 (patch) | |
tree | 7d095fefe3634b4ca86dc9088bb2990d64d3a7c8 /docs/user_guide | |
parent | 23158b0a69b85c9c6e5a7f2457bfe10be04d6132 (diff) | |
download | ComputeLibrary-fadc9b1e0bba90d6a91beb65466b2a0895b3a5e4.tar.gz |
Optimize CpuSoftmaxKernel for axis=0
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e.
softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) )
This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor.
It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs.
It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future.
Resolves: COMPMID-6500
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Diffstat (limited to 'docs/user_guide')
-rw-r--r-- | docs/user_guide/release_version_and_change_log.dox | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox index 13f4e9ea2a..ac4f0610ea 100644 --- a/docs/user_guide/release_version_and_change_log.dox +++ b/docs/user_guide/release_version_and_change_log.dox @@ -46,6 +46,8 @@ v24.01 Public major release You should link only to the main `libarm_compute` library for core functionality. - New features - Add support for FP16 in all multi_isa builds. + - Performance optimizations: + - Optimize @ref NESoftmaxLayer v23.11 Public major release - New features @@ -438,8 +440,8 @@ v21.02 Public major release - @ref NEActivationLayer - @ref NEArithmeticAddition - @ref NEBatchNormalizationLayerKernel - - @ref cpu::kernels::CpuLogits1DSoftmaxKernel - - @ref cpu::kernels::CpuLogits1DMaxKernel + - cpu::kernels::CpuLogits1DSoftmaxKernel + - cpu::kernels::CpuLogits1DMaxKernel - @ref cpu::kernels::CpuElementwiseUnaryKernel - Remove padding from OpenCL kernels: - CLDirectConvolutionLayerKernel |