aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-07Optimize CPU depth-to-spaceViet-Hoa Do
Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-06Revert "thread_local _custom_scheduler"Pablo Marquez Tello
This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab. Resolves COMPMID-6735 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05Optimize CpuSoftmaxKernel for axis=0Gunes Bayir
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-04Fix bare metal buildPablo Marquez Tello
* Resolves COMPMID-6733 Change-Id: I7f0428719b5c0aa79a0b356c50bb801db16558e8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10792 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-30Fix driver build errorPablo Marquez Tello
Resolves COMPMID-6734 Change-Id: I0f0a7a312504b7cca4ed36263843ccd1a190c09e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10803 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Nikhil Raj Arm <nikhil.raj@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-28Changes to enable FP16 in armv8a multi_isaPablo Marquez Tello
* This is the initial patch to start working on enabling fp16 in all multi_isa builds. More changes are required in the way we register the kernels using the macro REGISTER_FP16_NEON. * In this patch we add the capability to build the fp16 files in listed in filelist.json with the correct arch option to enable FP16 * This patch is required towards building an universal multi_isa binary where fp16 is enable. * Enable REGISTER_FP16_NEON macro for all builds by removing __ARM_FEATURE_FP16_VECTOR_ARITHMETIC guard from the macro definition. The macro has to be used across all types of builds. Change-Id: I99f4c273f6ee04cad3c097e5e374200f48568fa9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10682 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27BatchNorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved NCHW kernels fp16 and fp32 to their corresponding files src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp * Changes in filelist.json to include the new fp16 and fp32 files * Moved the template batch_normalization_nchw to impl.h as we need to instantiate it from fp16.cpp and fp32.cpp * Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that prevented the FP16 kernel execution. * Partially resolves MLCE-1102 Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27Check copyright for all filesJakub Sujak
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Change-Id: I654f53e5b4e53abc69ce385f6c706293bf8f7198 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10784 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-27CpuMul changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved fp16 and fp32 to their corresponding files src/cpu/kernels/mul/generic/neon/fp16.cpp and src/cpu/kernels/mul/generic/neon/fp32.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Partially resolves MLCE-1102 Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-11-24thread_local _custom_schedulerDavid Svantesson
Resolves ONCPUML-1331 This patch adds an option to make _custom_scheduler thread_local to support usage of multiple schedulers handled outside of ACL. It also adds num_threads() function to Scheduler which reverts to querying CPUInfo if no scheduler has been set. Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-23Remove the legacy core libraryJakub Sujak
Stop building and linking to the legacy libarm_compute_core artifact. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users should link only to the main libarm_compute library for core functionality. Resolves: COMPMID-6329 Change-Id: Ife9d2c25d275e7c676deb09632ae461f697efde9 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10728 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-16NormalizationLayer changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved the template arm_compute::normalize_float to impl.h because we need to instantiate it from both NENormalizationLayerKernel.cpp and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that the fp16 kernels can be compiled in for multi_isa builds * Moved fp32 kernels to the corresponding file src/cpu/kernels/norm_layer/generic/neon/fp32.cpp * Partially resolves MLCE-1102 Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-15Fix various coverity issuesSiCong Li
Resolves COMPMID-6677 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I99bf2385f6edc0836faacb31f5c66ed4fb051e40 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10729 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-15Fix device issue with CL softmaxViet-Hoa Do
* Performing the second pass in reverse order doesn't seem to work reliably in some specific devices. This patch introduces another approach to workaround the device issue. Resolves: COMPMID-6669 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I591f05ff06f8439ebe4d32093441ae871a292f4c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10730 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14Update comments to suppress doxygen warnings.Anitha Raj
Resolved COMPMID-6367 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I96f244811a81a4e278f0c5e47d5014229cad3a25 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10727 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14Update Release notes for 23.11Anitha Raj
Resolves COMPMID-6369 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I997a48c4e8efb67fbe53efd6fc498df4e6b41eea Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10726 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-13Update SONAME_VERSION in SConscript and VERSION in CMakeListsAnitha Raj
Resolves COMPMID-6369 Change-Id: I67dd589cdc02070dafe6f000988e6abafd6c5d79 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10722 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-13Update README for 23.11 releaseAnitha Raj
Resolves COMPMID-6369` Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I2a3dc604895321f71198ece1dfed3af3a25c0228 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10724 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-10Fix CpuGemmConv2d int8 segfaultSiCong Li
Bypass importation of memory of the original weights into the reinterpreted_weights auxiliary tensor if other weight transformation path is selected (which would've freed the original weights and its tensor info) Resolves COMPMID-6635 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib8a345c3ac542bc3745d6a67db822b55df37e827 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10698 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-10Update list of supported operators in documentationJakub Sujak
Resolves: COMPMID-6633 Change-Id: I1e78df468876ec3569fa46597734e7de328b06f4 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10663 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-09Remove duplicate definitions of BF16 fixed format kernels.David Mansell
Change-Id: Ie68b0a19040cc6b5bf47fca406989f39aa8d7b81 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10687 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-09Pooling changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Changes in filelist.json moving fp16 file from common to fp16 attribute * Changes in kernel CpuPool2dAssemblyWrapperKernel, replaced __ARM_FEATURE_FP16_VECTOR_ARITHMETIC by ENABLE_FP16_KERNELS to make sure the fp16 kernels are compiled in for multi_isa=1 * Partially resolves MLCE-1102 Change-Id: I327154ec5b1ddfb9f54d9096f00c35b3e05c678a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10662 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-09DepthwiseConvolution changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Changes in filelist.json moving fp16 file from common to fp16 attribute * Removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in CpuDepthwiseConv2dAssemblyWrapperKernel to always create the assembly kernel * Partially resolves MLCE-1102 Change-Id: I2f88d5e54a94042cfb3cb4ea0386338a7c444866 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10626 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-08Document how to build ACL with LLVM+Clang toolchainGunes Bayir
Resolves: COMPMID-6471 Change-Id: I5add2af4292ff2eeafcde85f3bff8e98b2069b13 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10660 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-08Optimize CpuGemmConv2d start-up timeSiCong Li
When weight has no holes, we can replace CpuWeightsReshapeKernel with: - Collapse by reinterpreting weight's 3 spatial dimensions - Perform CpuTranspose For more details see the documentation in src/cpu/operators/CpuGemmConv2d.cpp This is one optimization since the CpuTranspose is better performing than CpuWeightsReshapeKernel A second optimization is to fuse this transpose with other weight transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch) However this second optimization depends on how the underlying gemm methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly path: CpuGemmAssemblyDispatch) chooses to fuse the transpose. Therefore, this patch moves the transpose down from CpuGemmConv2d, to the individual gemm operators where the fusion decision needs to be made, by passing an extra "transpose_b" flag to CpuGemm New transpose_b flag in different scopes (they are all the same, but with different names because pretranspose_b has a different meaning in GemmAssemblyDispatch): GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b New auxilliary tensors holding the transposed b result: - CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB - CpuGemm fallback path: CpuGemm::PreTransposedRHS Note that this patch does not yet have the second optimization (COMPMID-6595), but it prepares for it. Relates to COMPMID-6595 Resolves COMPMID-6499 Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-07Update heuristic for MatMul Native U8Gian Marco Iodice
Resolves COMPMID-6479 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I13aa0ef944a75ba8b5e4df183d52df57b9aba90f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10659 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-06Fix Elementwise Division Dynamic Shape testsAnitha Raj
- Enable use_dynamic_shape ArithmeticDivisionDynamicShapeValidationFixture Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I42ddf5b604d728eda91fa45b239abf8caf2cda0f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10586 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-03Add Dynamic Quantization tests to Fully Connected LayerMohammed Suhail Munshi
This patch calculates the output quantization info based on the inputs' quantization information. The previous approach was using the same quantization information for input, weights and output. This implementation does not cover the cases where we have fused activation function. Resolves: [COMPMID-6484] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ib58143165191e82ae8547e661ac7c8d077bda200 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10539 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-01Increase tolerance for MatMul in FP16Sangwon Ha
Resolves: COMPMID-6585 Change-Id: Ibfc5412ab23f10026c872eab99a056eb682d77a1 Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10577 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-01Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82Viet-Hoa Do
Resolves: COMPMID-6599 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Id91185871f0dc30c08c7c38379acd5a3c1056473 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10575 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-01Fix compilation error with clang and multi-isaViet-Hoa Do
* Fix SVE header being included in non-SVE file. Resolves: COMPMID-6613 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ic7f662a239b761b83e67e11b6cc03f7d5f5cd051 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10573 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-31[GPU] Update Reverse layer to allow negative axis and reversed axis orderAdnan AlSinan
- Adds option to use negative axis and inverted axis. - Adds validation tests for the above. Resolves COMPMID-6459 Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Fix clang-tidy errorsJakub Sujak
Resolve the following clang-tidy errors: * use of undeclared identifier 'ARM_COMPUTE_ASSERT' * no template named 'AbsoluteTolerance' * no template named 'RelativeTolerance' These errors are a result of missing include headers in test fixtures. Resolves: COMPMID-6604 Change-Id: I8058c5848bb52a44925b2f99c9e8edf84dc79acc Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10561 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Use dynamic quantization in Convolution and Dilated Convolution testsGunes Bayir
This patch calculates the output quantization info based on the inputs' quantization information. The previous approach was using the same quantization information for input, weights and output. This implementation does not cover the cases where we have fused activation function. Resolves: COMPMID-6482 Change-Id: I4a9d87cfef8ad18ef241d457d23f44c8519a1389 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10541 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Extend CKW MatMul with nt_tAdnan AlSinan
- Add the kernel variant: (nt_t) to GpuCKWMatMul. - Extend CKW MatMul validation test with nt_t. - Fixes a bug in CKW where z-dim = 1. Resolves: COMPMID-6435 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I4c5e8791e55f21ffff3c11eca7802c51a4259977 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10525 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Fix SVE kernel using SVE2 instructionViet-Hoa Do
Resolves: COMPMID-6493 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I038d91ba266e1e8bf124336bcd272ec77e92038c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10490 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Optimize CL softmaxViet-Hoa Do
* The new softmax implementation consists of only a single kernel. - There are 2 versions of softmax, one for the x dimension and one for any other dimensions. - Softmax kernel handles both native and quantized data type. Resolves: COMPMID-6447 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-30DirectConv and Im2Col changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-30Use dynamic quantization in OpenCL™ Direct Convolution testsGunes Bayir
This patch calculates the output quantization info based on the inputs' quantization information. The previous approach was using the same quantization information for input, weights and output. This implementation does not cover the cases where we have fused activation function. Note that no Neon™ tests are changed since there were not any quantized Neon Direct Convolution tests. Resolves: COMPMID-6485 Change-Id: Id32241320acae0b58552546d6d6f665cd5c63044 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10470 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-26Add check to disable dynamic bias with quantized datatypes in Conv2D layerMohammed Suhail Munshi
Resolves: COMPMID-6397 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id4404f75dae03fd529db1adac5ab9ca48d08ec46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-20FuseBatchNorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Ie652203876a0ac12b025e96d20990b6efb21e772 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10477 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-17arm_gemm: Add SME2 FP16 GEMV using FP16->FP32 dot product.David Mansell
Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: If02f7809f9b6e84979121698c5e7a62cbb41e2c3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10487 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-17Fix memory Error in Reverse Fixture.Adnan AlSinan
Resolves: COMPMID-6581 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I0a634e064377e54b9190241c01fc75c212522ba7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10481 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-17Revert "arm_gemm: Add SME2 FP16 GEMV."David Mansell
This reverts commit aeced744b854758768243833bcdf999c0c3c1a5b. Reason for revert: Incorrect SME architecture checks. Change-Id: I23fe78178041a544a8791a4655bf6fe4aa375e38 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10501 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-13Connect MatMul MMUL kernels to ClMatMul operatorGunes Bayir
Resolves: COMPMID-6478 Change-Id: I5bc220c3bd00a316776fe14454438cc0dc9049b3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10469 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-13Fix build error in CpuScalePablo Marquez Tello
* Build error when using data_layout_support=nhwc * Some kernels need to be guarded by ENABLE_NCHW_KERNELS Change-Id: I0414084b458360c7e8d2842f4734c39aad80852e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10476 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12Scale changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Partially resolves MLCE-1102 Change-Id: If050608e56d75649b8d07757604ae10d6fc4269b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10461 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12arm_gemm: Add SME2 FP16 GEMV.David Mansell
Change-Id: I1f73819c25c66e4d13198e9c79755808d92b343d Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10466 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12Remove padding from CL comparison operatorViet-Hoa Do
* Add support for processing left-over vector to comparison kernel. * Combine native and quantized versions of CL comparison. Resolves: COMPMID-6424 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I31d43bdf0eab999cee6fa8144b5d8e921a1093e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10467 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-11Optimize CL reduction operationViet-Hoa Do
* Batch dimension is added to reduction operation. - All the dimensions higher than the batch dimension are collapsed so that the input and output tensors are always 3-4D. - CL kernel is called once instead of being repeatedly called to process each sliding window. Resolves: COMPMID-6443 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>