aboutsummaryrefslogtreecommitdiff
path: root/src/core/NEON/NEMath.inl
AgeCommit message (Collapse)Author
2023-09-28Reimplement erf functionViet-Hoa Do
* The current implementation has signfinicant inaccuracy and the issue cascades to GELU. * Use the implementation from ArmĀ® Optimized Routines. The maximum error is 1.93 ULP. Resolves: COMPMID-6554 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: If80131e164b7a078e34dd8e05b1506698f31d17a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10395 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-06-21Enable vmfa in arm7va/aarch32 when presentPablo Marquez Tello
* vfma is an extension on armv7a and it can be enabled with -mfpu=neon-vfpv4 * Resolves MLCE-1079 Change-Id: Id455c39ee4feb8d3cdc4515c8307eb8a5d6e093b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9795 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-04-25Fix rounding to nearest even for armv7aRamy Elgammal
- If input value to round has the decimal point beyond the fraction part (23 bit) then simply return the rounded value = input value. Resolves: COMPMID-6025 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1994e49a9bca7daeaeec7681aec099c63a97b53f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9473 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Add quantized support for unary elementwise in CPUViet-Hoa Do
* Add quantized unary elementwise in CPU using LUT. * Widen the input data range of the test suite. - Fix CPU exponential function overflow/underflow range. - Fix saturation issue of CL round operator. Resolves: COMPMID-5763 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Optimize CPU base-e exponential function on FP32Viet-Hoa Do
Resolves: COMPMID-5664 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4182752e213aade19005ee984a488c2490453f8f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8747 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-10-24Add FP16 tanh based on rational approximationJonathan Deakin
Use rational approximation with optimised coefficients to calculate tanh_f16. Method is ~2.5x faster than previous and has lower relative and absolute error. This will fix https://github.com/ARM-software/ComputeLibrary/issues/1002 Credit to George Steed for suggesting use of vcageq instead of min+max Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Change-Id: Id70da3aab666c68b0d798266a837b59c00937bf7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8480 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-28Fix overflow in NEActivationLayer for FP16 typePablo Marquez Tello
* Resolves MLCE-924 Change-Id: I3cc3d30893c2ee0865eacafdc1d9ba3d5b876d32 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8326 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Adding GELU activationMurray Kornelsen
OpenCL implementation uses built in erf. NEON implementation requires new vectorized erf. Uses the following approximation: erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4 a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108 From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2021-10-18DirectConv3d support refineSheri Zhang
- Decouple data support of CpuDirectConv3dKernel - Update documentation for Conv3d Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: I1d94aa28f821f45a1a3d39cc3335c8faeee89f0d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6453 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-05Fix for tanh at small argument valuesAleksandr Nikolaev
x - x^3/3 is more accurate approximation for |x| < 0.005 than (exp2x - 1)/(exp2x + 1). Resolves: COMPMID-4098 Signed-off-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com> Change-Id: If6f9d7ce4d8d00d36d2dada7ab8f8d9f5b58f5c0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/321354 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5563 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-29Remove stale/solved TODOsMichele Di Giorgio
Change-Id: I5c440f4c6ca4186adcfa926e6b7d924086671f29 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5520 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-03-05Move utility functions to NE/SVEMathSang-Hoon Park
To avoid unused function warnings when only partial data types are selected, the definition of functions are moved. Partially Resolves: COMPMID-4282 Change-Id: Ic30ddd3f2c88cac5978d27e5f4ada3639b5a04e5 Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5215 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-12-23Fix baremetal arm_compute_validation build errorsSiCongLi
* Add -C flag to instruct preprocessor not to strip comments. This is to prevent marker comments like '// fall through' that suppresses certain warnings from being removed. * Fix unused variable warnings. * Add M_PI definition that's missing from certain toolchain standard libraries. Resolves COMPMID-4054 Change-Id: I1d641db668685d4b678f3d0efed84bfe9e630b4b Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4692 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-10-07COMPMID-3637: Move wrapper to srcGeorgios Pinitas
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I524b0c4b49c7a7035b7d078b9585d77b0d438e10 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4083 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>