aboutsummaryrefslogtreecommitdiff
path: root/src/cpu/kernels
AgeCommit message (Collapse)Author
2024-02-21Integrate new pretranspose_b_array with extra fused transpose of BGunes Bayir
This patch fuses the transposition taking place in Acl with the transformations done in arm_gemm (called pretranspose_b_array) if the underlying kernel and transform supports it. This should improve start-up time (as it's for constant Rhs matrices) and memory footprint. The transformations in arm_gemm are kernel specific. The Rhs matrix is transformed into certain layouts to improve the performance. Resolves: COMPMID-6595 Change-Id: Id2932dd966e59f903c279417bebcea83d9a42464 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11144 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-20Requantization cases for offset changes onlyMohammed Suhail Munshi
Resolves: [COMPMID-6681] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I325b9d478dd1d04a45533bb7708cf76e98ee0cee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11058 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-05Fix leftover cols in CpuGemmLowpMatrixBReductionKernelJonathan Deakin
CpuGemmLowpMatrixBReductionKernel::run_internal randomly segfaults because it reads out of bounds with vloadq. This doesn't trigger with the unit tests because the read isn't out of bounds for the process, but it can be seen clearly by running the following in debug mode ./examples/neon_gemm_qasymm8 1 1 1 The vloadq at src/cpu/kernels/CpuGemmLowpMatrixReductionKernel.cpp:353 accesses a quadword even though the input is a single byte. relates to: ONCPUML-1444 MLINFSW-439 COMPMID-6844 Change-Id: I2ae5260c9f38d6d8149a6bcd5dc146b911209784 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10966 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-23Fix for Logically dead code detected in Coverity checksAnitha Raj
Resolves: COMPMID-6746 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I96c158820469af3e54dca0c5909c888106eb1940 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11005 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-10Use look up table for fp16 activationMohammed Suhail Munshi
- Enables FP16 lut for logistic activation - Adds LUTManager to re-use lut where appropriate. Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-04Prevent RELU from being processed thru LUT in INT8Sangwon Ha
- For quantized RELU activation, de-quantization and re-quantization is not required since comparison against the quantization bias is only required. Resolves: COMPMID-6340 Change-Id: I574bd220f3d0d893b7f7c4819a883e2a131f61f4 Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10916 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-07Optimize CPU depth-to-spaceViet-Hoa Do
Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05Optimize CpuSoftmaxKernel for axis=0Gunes Bayir
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27BatchNorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved NCHW kernels fp16 and fp32 to their corresponding files src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp * Changes in filelist.json to include the new fp16 and fp32 files * Moved the template batch_normalization_nchw to impl.h as we need to instantiate it from fp16.cpp and fp32.cpp * Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that prevented the FP16 kernel execution. * Partially resolves MLCE-1102 Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27CpuMul changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved fp16 and fp32 to their corresponding files src/cpu/kernels/mul/generic/neon/fp16.cpp and src/cpu/kernels/mul/generic/neon/fp32.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Partially resolves MLCE-1102 Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-11-16NormalizationLayer changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved the template arm_compute::normalize_float to impl.h because we need to instantiate it from both NENormalizationLayerKernel.cpp and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that the fp16 kernels can be compiled in for multi_isa builds * Moved fp32 kernels to the corresponding file src/cpu/kernels/norm_layer/generic/neon/fp32.cpp * Partially resolves MLCE-1102 Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-09Pooling changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Changes in filelist.json moving fp16 file from common to fp16 attribute * Changes in kernel CpuPool2dAssemblyWrapperKernel, replaced __ARM_FEATURE_FP16_VECTOR_ARITHMETIC by ENABLE_FP16_KERNELS to make sure the fp16 kernels are compiled in for multi_isa=1 * Partially resolves MLCE-1102 Change-Id: I327154ec5b1ddfb9f54d9096f00c35b3e05c678a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10662 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-09DepthwiseConvolution changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Changes in filelist.json moving fp16 file from common to fp16 attribute * Removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in CpuDepthwiseConv2dAssemblyWrapperKernel to always create the assembly kernel * Partially resolves MLCE-1102 Change-Id: I2f88d5e54a94042cfb3cb4ea0386338a7c444866 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10626 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-30DirectConv and Im2Col changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-20FuseBatchNorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Ie652203876a0ac12b025e96d20990b6efb21e772 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10477 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-13Fix build error in CpuScalePablo Marquez Tello
* Build error when using data_layout_support=nhwc * Some kernels need to be guarded by ENABLE_NCHW_KERNELS Change-Id: I0414084b458360c7e8d2842f4734c39aad80852e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10476 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12Scale changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Partially resolves MLCE-1102 Change-Id: If050608e56d75649b8d07757604ae10d6fc4269b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10461 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-10Fix build errorPablo Marquez Tello
* Build error when using data_layout_support=nhwc * Some kernels need to be guarded by ENABLE_NCHW_KERNELS Change-Id: I9fb6cf0e204531f81b0dff3572a1740ba94cde0e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10460 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-10CpuSubKernel changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: I497fe0ba6e84493a5072c3e80bbba7ecd5de8095 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10448 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-09Pool2d changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be moved from src/cpu/kernels/pool2d/neon/nchw/all.cpp to src/cpu/kernels/pool2d/neon/fp16.cpp. * In src/cpu/kernels/pool2d/neon/list.h when we declare the kernels we need to remove defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) so that in std::vector<CpuPool2dKernel::PoolingKernel> available_kernels * Partially resolves MLCE-1102 Change-Id: I000380f8eccca17e6219c4f3453980d67a2c9dd8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10444 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-05Optimize CLTranspose operatorJakub Sujak
* Transpose higher dimensional tensors (>2D) by collapsing higher dimensions into the third dimension thus avoiding multiple dispatches of the CL kernel * Maximize tile size without register spilling Resolves: COMPMID-6448 Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-09-26Re-arrange header inclusion orderFelix Thomasmathibalan
Inclusion order of header is changed as preparatory step for applying clang-format Change-Id: I0c529f896ba802dfc6f30a573cdc9d9a24f3081c Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10379 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-26Select changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template select_op() which had to be moved from impl.cpp to fp16.cpp * Partially resolves MLCE-1102 Change-Id: Ic9e73e121482fcc5e4fcbe8ae1ecd23649cbd3d1 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10359 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-09-26Maxunpooling changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template max_unpooling() which had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Iabf9a9ba9d2441032f931f33aad97acc3e332575 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10362 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-09-21L2Norm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template l2_normalize_x() and l2_normalize_yz which had to be moved from impl.cpp to impl.h * Removed impl.cpp * Partially resolves MLCE-1102 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Id00a823730108293fc712295a178dad80588af30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10344 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-21Gemm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the templates vector_matrix_multiply_f16() and matrix_matrix_multiply_f16 which had to be moved from impl.cpp to fp16.cpp * Partially resolves MLCE-1102 Change-Id: Ic87440797d6f1653c815ab6565972206f5afd0ad Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10345 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Remove deprecated support for BF16 in CpuCastAdnan AlSinan
Resolves : [COMPMID-6212] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-09-15GenerateProposals changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template compute_all_anchors() that had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Iaff6da32d0b9789ef87ba3f95bef99343612bd01 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10309 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Fuse batch normalization changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template fused_batch_normalization_dwc_nhwc() that had to be moved from impl.cpp to impl.h * Removed impl.cpp * Partially resolves MLCE-1102 Change-Id: Idaaa113c71729e32e565acf5fb5694c76c36d76d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10308 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Fix include dependencies for mass reformatting patchGunes Bayir
This patch fixes some include dependencies in certain files that caused build failures in https://review.mlplatform.org/c/ml/ComputeLibrary/+/10287. It also circumvents some clang-format glitches. Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8e9d3307edd2d1afd17c685c9bc9429624130e5a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10313 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Softmax changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: I2e5e68fbcf5279de1ffc1be4def4f96ed05593e9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10224 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes to InstanceNrom to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: If53ff1927948b3ad7c9e3c9347bc2af38764e342 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10243 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes in NECropResize to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template in_bounds_crop_window so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/crop/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I1953849153e672ff7938f54c877c7498117dcca4 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10282 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-08Meanstddevnorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I7e6d998e427982d4a037dbce6d17ca378665e07f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10241 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to BoundingBoxTransform to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I04822b043d9f87bc666750a8d95a8be8a6cc194d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10239 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to ElementwiseOp to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I5ecfc8f6c0d84f92d80bec2cde6e7338794b9788 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10240 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04DWC changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template run_depthwise_float() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: I428a79c4ab3a990331f20f5bd6b9fea88b0836b9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10218 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-01Pool3d changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Removed src/cpu/kernels/pool3d/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I71e6a54a27fd8f04ae2a67231709aad723b09fa3 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Changes in roi_align to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template roi_align() so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/roialign/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: If78371479042725723cea6f6c65aac76d68a1c1d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10213 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-08-29NEFuseBatchNormalizationKernel reworkPablo Marquez Tello
* Enable fp16 in armv8a multi_isa builds * Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Ia51007f5e663b708071958bb94bfab4535e4b2f8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10191 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-23CpuAdd rework to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h Change-Id: I9e64a3101958fcb9c3d5c8e9b148b498b2bee05f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10154 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22Optimize CpuReshapeKernelAnitha Raj
Resolves COMPMID-5279 Change-Id: Id9b007eed62c200702bbfcc83b94dab7b5de1714 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9962 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-17Fix depthwise convolution not using assembly kernelViet-Hoa Do
* Take dilation into account when checking padding. Resolves: COMPMID-6348 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I897a13ba7f37382733c35c1701d1ec310ed55331 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10147 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Add support for S64 output in NEArgMinMaxLayerPablo Marquez Tello
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32 * We need to call NECast to convert from S32 to S64 * Resolves MLCE-1089 Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-19Add support for input S64/U64 in CpuCastKernelPablo Marquez Tello
* The kernel now supports the following conversions: S64 -> F32 U64 -> F32 * Resolves MLCE-1089 Change-Id: I277cf58b78d919fde25947520d2056e1412c7f82 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9935 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-04Depthwise channel pre-multiplicationMichael Tyler
Resolves: COMPMID-6337 Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Remove dependency on fp16 definitions from some core include filesMatthew Bentham
This significantly improves the compilation times for parts of the core library that just need a definition of float16_t rather than access to all of the fp16 intrinsics. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: I5da1c6b0df0dd87d1d17948cd2e9b7375874f455 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/529385 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9781 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>