aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL
AgeCommit message (Collapse)Author
2024-04-30Add fp16 and integer data type support for ScatterNd in GpuGunes Bayir
Resolves: COMPMID-6899 Change-Id: I3743f2c9e5c21e1ec9f4c81d08c148666afad33a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11505 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-25Add update/index/output (m+1)/2d/(m+n) support for CLScatterGunes Bayir
Resolves: COMPMID-6894, COMPMID-6896 Change-Id: I9d29fd3701a7e0f28d83f81a6c42a7234c2587c3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11477 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-22Scatter GPU Kernel Implementation for 1D tensors.Mohammed Suhail Munshi
Resolves: [COMPMID-6891, COMPMID-6892] Change-Id: I5b094fff1bff4c4c59cc44f7d6beab0e40133d8e Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11394 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-20Make Cpu/Gpu/Ref scalar/vectoral S32 division consistentGunes Bayir
- Neon(TM) implementation converts integers to float and performs the division because there is no vector integer division instructions. However, leftover loop still uses integer division, which makes results inconsistent depending on where we are in the tensor. - SVE path does it in integer domain. - OpenCL(TM) does it similar to Neon(TM) vector path. - Reference implementation does it in integer domain. These differences cause intermittent mismatches. This patch ensures all follow the same logic. On the other hand, the provided Neon(TM) implementation is faster than the Fp32 converted version. Resolves: COMPMID-6925 Change-Id: Ia12606d57f40a7d331b9b698f87fd4321496b275 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11316 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-23Fix for unchecked return value detected in Coverity checks.Anitha Raj
Resolves: COMPMID-6753 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I80df0479eb4c7cc2c5380df708844cc9ffdd2aed Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11001 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-01-18Fix divide-by-zero compilation errorViet-Hoa Do
* CONVERT_TO_TENSOR4D_STRUCT_NO_STEP is implemented and used in some CL kernels in the way that causes divide-by-zero issue. - Since the steps are all zeros, the issue might have been ignored by the compiler. Resolves: COMPMID-6795 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0fb38fc62d63671b8abefa39b3d9b3ca6f49c7fe Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10967 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-22Fix nightly issue caused by gemm_reshaped_only_rhs_mmul kernelGunes Bayir
The issue appears when this kernel is used by convolution operators because the stride calculations consider only simple matrix multiplication. In conv2d triggered runs, Rhs does not have the same dimension as Lhs and Dst. Also, cases where Lhs and Dst are interpreted as 3d, where their X and Y dimensions (in convolution sense) are collapsed into one. Resolves: COMPMID-6764 Change-Id: If443e6eb8f7a5cca1acc58b37c598122a013e69b Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10913 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-14Fix validation error in CL generate proposals kernelGunes Bayir
This fix modifies some of the conversions done in the generate proposals kernel that causes DDK issues while compiling the kernel. The issues are mostly related to conversion from i64 to fp16, and it doesn't affect fp32. Firstly, type identifier size_t is converted into unsigned int. But, this alone was compiling but causing mismatches, even in older devices, where it was passing before. Therefore, the fp16 conversion delayed until vector construction where the integers are now converted to fp32, and then fp16. This, although may not be ideal, seems like the best solution. Resolves: COMPMID-6756 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Iee61216c908fe51431985b80c3653fc32add4741 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10879 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-08Fix unit tests failing in CL/UNIT/TensorAllocatorGunes Bayir
The function pointer for clImportMemoryARM should be loaded in a portable way as recommended by Khronos® as outlined here: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#getting-opencl-api-extension-function-pointers using clGetExtensionFunctionAddressForPlatform() call. All extensions should ideally be loaded using the above mentioned function. Resolves: COMPMID-6732 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I482b6bde721267d5e8c08301e5780d28a9c5ba85 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10852 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-15Fix various coverity issuesSiCong Li
Resolves COMPMID-6677 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I99bf2385f6edc0836faacb31f5c66ed4fb051e40 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10729 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-15Fix device issue with CL softmaxViet-Hoa Do
* Performing the second pass in reverse order doesn't seem to work reliably in some specific devices. This patch introduces another approach to workaround the device issue. Resolves: COMPMID-6669 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I591f05ff06f8439ebe4d32093441ae871a292f4c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10730 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-31[GPU] Update Reverse layer to allow negative axis and reversed axis orderAdnan AlSinan
- Adds option to use negative axis and inverted axis. - Adds validation tests for the above. Resolves COMPMID-6459 Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Optimize CL softmaxViet-Hoa Do
* The new softmax implementation consists of only a single kernel. - There are 2 versions of softmax, one for the x dimension and one for any other dimensions. - Softmax kernel handles both native and quantized data type. Resolves: COMPMID-6447 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-12Remove padding from CL comparison operatorViet-Hoa Do
* Add support for processing left-over vector to comparison kernel. * Combine native and quantized versions of CL comparison. Resolves: COMPMID-6424 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I31d43bdf0eab999cee6fa8144b5d8e921a1093e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10467 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-11Optimize CL reduction operationViet-Hoa Do
* Batch dimension is added to reduction operation. - All the dimensions higher than the batch dimension are collapsed so that the input and output tensors are always 3-4D. - CL kernel is called once instead of being repeatedly called to process each sliding window. Resolves: COMPMID-6443 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-05Optimize CLTranspose operatorJakub Sujak
* Transpose higher dimensional tensors (>2D) by collapsing higher dimensions into the third dimension thus avoiding multiple dispatches of the CL kernel * Maximize tile size without register spilling Resolves: COMPMID-6448 Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-29Implement Quantized Matmul T/T and T/Nt kernels using MMUL extensionGunes Bayir
Resolves: COMPMID-6476, COMPMID-6477 Change-Id: Ied37c269d5a108ff72f70e3ad932cf372bda5562 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10346 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Revise clang-format configurationJakub Sujak
Clang-format options now match those in clang-format version 14. Remove Astyle checks as the same code style checks are provided by clang-format. Resolves: COMPMID-6576 Change-Id: Iefa9bb719826242a3276e9ca058d0c84624f7302 Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10399 Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Implement Quantized Matmul Nt/T kernel using MMUL extensionGunes Bayir
Resolves: COMPMID-6474 Change-Id: Iaff5b512cf77975f2df02dcdf848711b13bf97a6 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10341 Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-09-21Optimize the main loop in mat_mul_native_quantized_mmul_nt_ntGunes Bayir
Switch the loop unrolling order and reuse the pre-computed vectors Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I636c0530d6b21dae4dbb371c57d18b1f7c7246a8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10355 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-18Add CL command buffer classViet-Hoa Do
* Two implementations of the command buffer are added: - CLMutableCommandBuffer uses mutable dispatch command buffer extension. - CLCompatCommandBuffer is the compatibility class for platform without the CL extension. Resolves: COMPMID-6454 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I15b370a50168ca940bd8fb2b5fae26230da3f472 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-18Implement Quantized MatMul kernel using MMUL extensionGunes Bayir
Resolves: COMPMID-6475 Change-Id: Ic867cdfff5d4391cb749a04bf7cc35cda63d3b71 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10311 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-14Add skeleton of ClMatMulLowpNativeMMULKernelGunes Bayir
The skeleton code consists of modifications - to build the library with the quantized matmul kernel - refactoring of some common utilities - empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t]) - some validation tests and skeleton for functional tests Resolves: COMPMID-6473 Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Remove legacy PostOps codeJakub Sujak
PostOps was the experimental interface for Dynamic Fusion. It is now replaced by the new Dynamic Fusion interface with code generation using the Compute Kernel Writer. Resolves: COMPMID-6190 Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-17Fix various static check issuesViet-Hoa Do
Resolves: COMPMID-6495 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I916829222a6211fa096a833a2afc5fab5eb34ea4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10143 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Check CL command buffer extensionViet-Hoa Do
* Add helper functions to check whether command buffer extensions exist in CL device. Resolves: COMPMID-6453 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibc287e4526e54be4702241ab8ca0cea0b8661b3a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10130 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-14Optimize CLReduce for Min/Max Axis=0Gunes Bayir
Resolves: COMPMID-6400 Change-Id: Id9935f9727f77a824afc75c35f044e3f5c173e0d Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10120 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix CL Tile operatorViet-Hoa Do
Resolves: COMPMID-6404 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I75aebe620567ed50817747589bbe8cfb63715a7b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10036 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-21Enable S64 output in CLArgMinMaxPablo Marquez Tello
Resolves MLCE-1089 Change-Id: I8b385ef8a00ec5de60299bc7a359766ba5417e68 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9918 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-11Add Bias to MatMul Kernels and add support for use in Fully Connected LayerMohammed Suhail Munshi
Resolves: [COMPMID-6316] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I08e6bac9e6b46b76978da0dc6a48ccfe3dde5086 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9833 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-10Disable kernel size 3 in argminmax for axis 0Pablo Marquez Tello
* Only kernel sizes supported are 2, 4, 8 and 16. * Resolves COMPMID-6349 Change-Id: I30c85dcb3505d47fe56a2d2a08e9221ff426ee93 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9890 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-06Fix nightly failures in MatMulLowpNativeKernel when using bounded activation ↵Mohammed Suhail Munshi
functions - Added checks for supported activation functions in MatMulLowpKernel validate - Replaced incorrect float activation macro with quantized implementation in mat_mul_quantized Resolves: [COMPMID-6339] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I15661f14877f1d3305644e6473feb5482a67e773 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532858 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9855 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-05Rewrote CLArgMinMax for axis 0Pablo Marquez Tello
* Simpler implementation without stages for axis 0 * Removed considerable amount of code. Resolves COMPMID-6271 Change-Id: Ie8bcb2f0b55f87472f44b38872a23a922619a211 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9849 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-04Add Kernel Writer driver code to dynamic fusionSiCong Li
* Partially port ElementwiseBinary component to ckw (broadcast not supported yet) * Port Store component to ckw * Move KernelArgumentsHelpers to ckw_driver/ as it's only used by the driver ckw_driver is a middle layer between dynamic fusion and Compute Kernel Writer (CKW). It consumes the fused kernel component stream produced by Dynamic Fusion and uses CKW to write the kernel code complete with all meta info needed by the runtime to enqueue the kernel. It consists of two parts: * Kernel writing: This resides in dynamic_fusion/sketch * Runtime utilities: This resides in dynamic_fusion/runtime The integration (separation between DF and CKW) occurs in two places: * Inside GpuCKWDriver global driver that coordinates how the final fused kernel code is assembled together alongwith other meta info needed by runtime. * Inside each instantiated IGpuCKWComponentDriver component driver that drives CKW to write component-specific code or do component-specific configurations Partially resolves: COMPMID-5792 COMPMID-6282 COMPMID-6260 COMPMID-6266 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib57a080a65fe8cfee1a8df1529fe572005a6d2f2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9847 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-29Implement FP32/16 MatMul Lhs T Rhs T/NT kernel using MMUL extensionGunes Bayir
Resolves: COMPMID-6196, COMPMID-6197 Change-Id: I22a1c32686eb70e7676c8b4d64a76dbaeb638cb3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9798 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Add helpers to set CKW tensor components as OpenCL kernel argumentsJakub Sujak
* Define ckw::TensorStorage. The tensor storage represents the type of tensor memory object. * Add helper functions for setting the CKW TensorComponent and TensorStorage as OpenCL kernel arguments. * Refactor CL Image2D method for simpler image object creation. Resolves: COMPMID-5784 Change-Id: I2d37d06783c1dc55f3b5692b44eb49b151f2401c Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9807 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Use MatMul in fully connected layer with dynamic weights when supportedMohammed Suhail Munshi
- Use MatMul kernels in FC layer when using dynamic weights without broadcasting or bias. - Fix minor typo in IClMatMulNativeKernelConfig.h Partially Resolves : [COMPMID-6193] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id494062b5b4f4e75ff9714c202dde941955afa52 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9797 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-23Implement FP32/FP16 MatMul NT/T kernel using the MMUL extensionRamy Elgammal
Resolves COMPMID-6195 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I8e85fe73308ed84ebb142d6d6d1562b62dddfaa5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9819 Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-23Fix doxygen warningsramy.elgammal@arm.com
Resolves: COMPMID-6312 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I9f68ccd2edb8c4d03fec19e6b9c29609d4833342 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-19Implement FP32/FP16 MatMul NT/NT kernel using the MMUL extensionSiCong Li
Resolves COMPMID-6194 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ie45e2aa9533948b2e5235563cef1d3834494eccf Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9739 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-16Add Fused Activation to OpenCL MatMulMohammed Suhail Munshi
- Added fused activation to MatMul function interface - Added fused activation to CL backend - Includes tests for supported Activation Functions in MatMul Resolves: [COMPMID-6192] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie103212b600b60699eaf6a6394d609e6e1f5aba6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/522465 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9714 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up Utils.h a bit to reduce unused code being included everywhereMatthew Bentham
Move some maths-related things from Utils.h to new Math.h header in utils/math. Move some routines used for Tensor shape validation to Validate.h Change-Id: I8ce89fe03ec3ae1b61d1a80c282b8b91eea0cfb3 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524783 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9743 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up arm_compute/core/Types.h a bitMatthew Bentham
Split some of the larger types with inlined code into their own header files, so that the implementation of them needn't be included everywhere. Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-17Revert "Check for nullptr when failing to load OpenCL libraries"Omar Al Khatib
This reverts commit 9d254610ef3072263ac5c20eed906157e8869b3d. Reason for revert: Causing High Impact failure in Coverity Change-Id: I7a3b963cc13c0eea856a15451ffed8c4d4a62e75 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9643 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: SiCong Li <sicong.li@arm.com>
2023-05-16Check for nullptr when failing to load OpenCL librariesJakub Sujak
Resolves: COMPMID-6272 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Change-Id: I4b4ff4eeb909ac95c63d1f2d2081f2be9ade2295 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9662 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-11Fix invalid vector length in CLViet-Hoa Do
Resolves: COMPMID-6252 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I97ddf8a6c83bc2621abc712094db6bc0fe3d97b1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9620 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Support multi-dimensional indices in the CL Gather Layer up to ↵Omar Al Khatib
four-dimensional output tensors Resolves [COMPMID-5775] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I6f6c12ac08f0b0ad070ca5d715c531c2c3762c30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>