aboutsummaryrefslogtreecommitdiff
path: root/src/cpu/kernels/CpuActivationKernel.cpp
AgeCommit message (Collapse)Author
2023-05-17Move lut kernel to sve2 categoryv23.05branches/arm_compute_23_05SiCong Li
This specific Lut kernel uses sve2 instructions Resolves: COMPMID-6268 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I44fa3812e96fa79b3d1e1e3a31d587581f59f0e1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9675 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix regression caused by mws in ActivationLayerMohammed Suhail Munshi
- Regression is caused by the small default mws in ActivationLayer - Syncronization of threads takes longer than the workload on small sized tensors. - Size 1536 is chosen arbitrarily based on the size of tensors in benchmarked networks Resolves: [COMPMID-5655] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I02e865b578399e75484f471e67806dd4cf7502c0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/468454 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8615 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-08SVE Hard-Swish via Lookup table for quantized inputPablo Marquez Tello
* Datatype supported qasymm8 and qasymm8_signed. * Resolves COMPMID-5211 Change-Id: Ib161af3af5ad11ec6b46de0bf4fbac172d2525fb Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7729 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-19Fix FFTConvolutionLayer testViet-Hoa Do
* Do not change the tensor info after configure stage. * By fixing this, the 1D optimization for activation layer can be applied to all data types and tensor layout. Resolves: COMPMID-5644 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I557f9bb84e5e456c28d6b423584887d7a3648ad4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8470 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-12Optimize Neon™ Logistic ActivationMohammed Suhail Munshi
- Use a 1d execution window to improve memory access pattern. Resolves: [COMPMID-5465] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ida30669ffa06eb002ca43a6edf15e25a6eaad2f6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8344 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Adding GELU activationMurray Kornelsen
OpenCL implementation uses built in erf. NEON implementation requires new vectorized erf. Uses the following approximation: erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4 a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108 From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-17Add LUT for quantized sigmoid functionViet-Hoa Do
* Move LUT implementation to a seperate file. It will be used for both QASYMM8 and QASYMM8_SIGNED. * Fix wrong constant value related to QASYMM8_SIGNED leaky ReLU in 32-bit build. Resolves: COMPMID-5464 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2b24d52409a38f1b66fd532f431eff8a9e4547b6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8066 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-29Add LUT-based leaky relu for QASYMM8 on CPUViet-Hoa Do
* Add LUT generation function for Leaky ReLU. * Some additional changes in the existing LUT implementation: + Bring back the NEON implementation of hard swish for 32-bit build. Library size of 64-bit build is not affected. + Add some extra #ifdef to remove unnecessary code in 32-bit build. Resolves: COMPMID-5386 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I1ea49611cc922765ee741e31138c888401d33e9b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7845 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-06-23Select neon LUT Hard-Swish kernel on all devicesPablo Marquez Tello
* Resolves COMPMID-5211 Change-Id: I560ab2992c6089774c7ebee3538847905521607d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7840 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2022-06-09Fix crash in CpuActivationKernelPablo Marquez Tello
* There was a problem when dst was nullptr * Resolves IVGCVSW-7010 Change-Id: I7e591283906b9a1deaa879fa00387f0ddbf8cc48 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7704 Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-07Compute Hard-Swish with a Lookup table for qasymm8.Pablo Marquez Tello
* Resolves COMPMID-5211 Change-Id: I5914f971d733174dae67e6b4c589f45c75733cf7 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7654 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-04-25Add LU_BOUNDED_RELU support for QSYMM16Pablo Marquez Tello
Partially resolves MLCE-604 Change-Id: Id585ab19fe5cd8f61c07a0aae6faac6ba5545d6d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7379 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-03-01Multi ISA Technical DebtDana Zlotnik
* Update json struct meet multi-ISA updates * Add impl.cpp in kernels where we only have impl.h Resolves COMPMID-5173 Change-Id: I5da3c4b016a5d0115c4ba46cbfefde7bce518ac1 Signed-off-by: Dana Zlotnik <dana.zlotnik@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7191 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-01-21A73 Devices Regression 300% fixMohammed Suhail Munshi
- Currently regresses on A73 devices (tested on android hikey, inceptionv3), this patch solves this - Changed mws for all cores to use default values - Existing mws value for A73 tuned for hikey-linux, caused regression on hikey-android Resolves [COMPMID-5044] Change-Id: Ifd6faaa34a0b405d0c390015566f2c75436dfb07 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6973 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-01-12Enable kernel selection testing (Phase #1)Giorgio Arena
Change-Id: I1d65fb9d3a7583cf8d4163ca7c0fbee27dc52633 Signed-off-by: Yair Schwarzbaum <yair.schwarzbaum@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6767 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-12-20Decouple CpuActivationKernelDana Zlotnik
1- Data types were already decoupled. This commit arrange the folder struct of the activation kernel. 2- Refactor NEON CpuActivationKernel for floating-point cases. Resolves COMPMID-4636 Change-Id: Ia4527244c84260dce1dd1d4bd4a9e3cfe2486d85 Signed-off-by: Dana Zlotnik <dana.zlotnik@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6739 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-12-10Fix 300% Regression CPU - Change default mws value in Kernel filesMohammed Suhail Munshi
Resolves: COMPMID-5001 Change-Id: I13fbe859d2557be0459ba76da0136d0efb15f311 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6809 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-11-16Implement 1D Adaptive Workload Splitting in CPPSchedulerDana Zlotnik
Resolves COMPMID-4649 Change-Id: I941d2f8a40737ff05c49f6695a42884731ef2dc9 Signed-off-by: Dana Zlotnik <dana.zlotnik@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6656 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-18Implement Minimum Workload Size (MWS) in all CPPKernels used by small networksDana Zlotnik
* create get_mws method in ICPPKernel class that retuns default value for all kernels * overwrite the default value for all the kernels used by small networks (according to banchmark case) Resolves COMPMID-4648 Change-Id: I46d7cae61217213279d2ee740edc73f600b6d576 Signed-off-by: Dana Zlotnik <dana.zlotnik@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6412 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-25Move CPU/GPU files from Core/Runtime to the respective backend foldersGeorgios Pinitas
Legacy structure contained two libraries core/runtime with two backends in each. We reduce the core/runtime libraries to a single library thus merging the backend files Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>