aboutsummaryrefslogtreecommitdiff
path: root/Android.bp
AgeCommit message (Collapse)Author
13 daysScatter GPU Kernel Implementation for 1D tensors.Mohammed Suhail Munshi
Resolves: [COMPMID-6891, COMPMID-6892] Change-Id: I5b094fff1bff4c4c59cc44f7d6beab0e40133d8e Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11394 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-15Add s8f32 kernels and dynamic QuantizationInfoJonathan Deakin
- Add support for QASYMM_SIGNED*QASYMM8_SIGNED->F32 in CpuGemmLowpMatrixMultiplyCore - Add s8f32 kernel using existing s8->s32 kernels with a new DequantizeFloat OutputStage, the structure is similar to Requantize32 but the opposite way around. - Add SME s8f32 kernels with integrated support for DequantizeFloat. - Add scale to CpuGemmLowpOffsetContributionKernel. - Add virtual dequantize scale to gemm_common, only implemented for gemm_interleaved. - Update year to 2024 in generate_build_files. - Add dynamic flag to QuantizationInfo which signals to operators that it can change after configuration - Add support for dynamic quantization in NEGEMMLowpMatrixMultiplyCore - Add dynamic quantization fixture by extending GEMMLowpGenericMatrixMultiplyCoreValidationFixture - Add GEMMLowpDequantizedMatrixMultiplyValidationFixture - Store k (number of cols of A) rather than k_offset in the offset contribution kernels so that we can recompute it when the other offsets change relates to: ONCPUML-1444 MLINFSW-439 Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Co-authored-by: David Mansell <David.Mansell@arm.com> Change-Id: I58a3acf2c09289a303e52eea6b336a696a5bc8da Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11022 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-27Added new NEON fixed format fast math mode hybrid kernel with maximum height ↵Milos Puzovic
of 6 for accumulation and updated heuristics Change-Id: Ib52ea6825e164f4a8b8422eab7991b50af0b0d7c Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11354 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-21Add skeleton for CLScatter op, reference and testsMohammed Suhail Munshi
- Adds dataset for tests - Adds skeleton for function, operator, reference and tests Resolves: [COMPMID-6889] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I7e57e8b4577fef6aa7421e672894c249cad6c5fa Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11234 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-21[ONCPUML-1451] Add matmul kernel to enable bf16 to bf16 operations via ↵Renato Arantes
PyTorch® autocast() function The full range of tests must be added with [MLINFSW-482] epic due to the lack of reordering kernels implemented in Acl. Co-Authored-By: David Mansell <David.Mansell@arm.com> Change-Id: I820d316295a1ec94fdc89c37e4144a268f914c36 Signed-off-by: Renato Arantes <renato.arantes@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11169 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-21Integrate new pretranspose_b_array with extra fused transpose of BGunes Bayir
This patch fuses the transposition taking place in Acl with the transformations done in arm_gemm (called pretranspose_b_array) if the underlying kernel and transform supports it. This should improve start-up time (as it's for constant Rhs matrices) and memory footprint. The transformations in arm_gemm are kernel specific. The Rhs matrix is transformed into certain layouts to improve the performance. Resolves: COMPMID-6595 Change-Id: Id2932dd966e59f903c279417bebcea83d9a42464 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11144 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-09Remove CKW prototype and Template WriterGunes Bayir
Gpu code in dynamic fusion is now written by stable CKW. We do not need CKW protoype and the older writer implementation, i.e. TemplateWriter. It also removes the need for the flag -DACL_INTERNAL_TEST_CKW_IN_DF to compile and test dynamic fusion operator. Resolves: COMPMID-6715 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I9f9453311e79d9be612bd4754240d832f98503e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11116 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-01-22Don't build CKW as part of Android.bpJakub Sujak
CKW support is still experimental. Change-Id: Iad98652b4b24acd918772edabc1b6bae4eeb259f Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10996 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-01-10Use look up table for fp16 activationMohammed Suhail Munshi
- Enables FP16 lut for logistic activation - Adds LUTManager to re-use lut where appropriate. Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-22Add Mali™-G720 and Mali™-G620 as GpuTargetsGunes Bayir
This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues. Resolves: COMPMID-6766 Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-07Optimize CPU depth-to-spaceViet-Hoa Do
Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27BatchNorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved NCHW kernels fp16 and fp32 to their corresponding files src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp * Changes in filelist.json to include the new fp16 and fp32 files * Moved the template batch_normalization_nchw to impl.h as we need to instantiate it from fp16.cpp and fp32.cpp * Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that prevented the FP16 kernel execution. * Partially resolves MLCE-1102 Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27CpuMul changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved fp16 and fp32 to their corresponding files src/cpu/kernels/mul/generic/neon/fp16.cpp and src/cpu/kernels/mul/generic/neon/fp32.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Partially resolves MLCE-1102 Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-11-16NormalizationLayer changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Moved the template arm_compute::normalize_float to impl.h because we need to instantiate it from both NENormalizationLayerKernel.cpp and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that the fp16 kernels can be compiled in for multi_isa builds * Moved fp32 kernels to the corresponding file src/cpu/kernels/norm_layer/generic/neon/fp32.cpp * Partially resolves MLCE-1102 Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Optimize CL softmaxViet-Hoa Do
* The new softmax implementation consists of only a single kernel. - There are 2 versions of softmax, one for the x dimension and one for any other dimensions. - Softmax kernel handles both native and quantized data type. Resolves: COMPMID-6447 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-30DirectConv and Im2Col changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-17arm_gemm: Add SME2 FP16 GEMV using FP16->FP32 dot product.David Mansell
Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: If02f7809f9b6e84979121698c5e7a62cbb41e2c3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10487 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-17Revert "arm_gemm: Add SME2 FP16 GEMV."David Mansell
This reverts commit aeced744b854758768243833bcdf999c0c3c1a5b. Reason for revert: Incorrect SME architecture checks. Change-Id: I23fe78178041a544a8791a4655bf6fe4aa375e38 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10501 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-12arm_gemm: Add SME2 FP16 GEMV.David Mansell
Change-Id: I1f73819c25c66e4d13198e9c79755808d92b343d Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10466 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-10Port MatMul to Dynamic Fusion + CKW boilerplate codeAdnan AlSinan
- Port Matmaul to to Dynamic Fusion. - Prepare a CKW boilerplate code. - Implement the following classes: - MatMulAttributes - GPUMatMulSettings - GpuMatMul - ClComponentMatMul - GpuCkwMatMul Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I5a7c183b293973e8a4233b554b2affe0bb28f44d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10453 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-10CpuSubKernel changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: I497fe0ba6e84493a5072c3e80bbba7ecd5de8095 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10448 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-04Port DepthwiseConv2d operator to Ckwramy.elgammal@arm.com
- Only support 1x1 blocks, i.e. n0=1, m0=1. - Dilation not supported yet. Resolves: COMPMID-6258 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Reimplement erf functionViet-Hoa Do
* The current implementation has signfinicant inaccuracy and the issue cascades to GELU. * Use the implementation from Arm® Optimized Routines. The maximum error is 1.93 ULP. Resolves: COMPMID-6554 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: If80131e164b7a078e34dd8e05b1506698f31d17a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10395 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-26Select changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template select_op() which had to be moved from impl.cpp to fp16.cpp * Partially resolves MLCE-1102 Change-Id: Ic9e73e121482fcc5e4fcbe8ae1ecd23649cbd3d1 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10359 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-09-26Maxunpooling changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template max_unpooling() which had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Iabf9a9ba9d2441032f931f33aad97acc3e332575 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10362 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-09-21L2Norm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template l2_normalize_x() and l2_normalize_yz which had to be moved from impl.cpp to impl.h * Removed impl.cpp * Partially resolves MLCE-1102 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Id00a823730108293fc712295a178dad80588af30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10344 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-18Add CL command buffer classViet-Hoa Do
* Two implementations of the command buffer are added: - CLMutableCommandBuffer uses mutable dispatch command buffer extension. - CLCompatCommandBuffer is the compatibility class for platform without the CL extension. Resolves: COMPMID-6454 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I15b370a50168ca940bd8fb2b5fae26230da3f472 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Remove deprecated support for BF16 in CpuCastAdnan AlSinan
Resolves : [COMPMID-6212] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-09-15Fuse batch normalization changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template fused_batch_normalization_dwc_nhwc() that had to be moved from impl.cpp to impl.h * Removed impl.cpp * Partially resolves MLCE-1102 Change-Id: Idaaa113c71729e32e565acf5fb5694c76c36d76d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10308 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-14Add skeleton of ClMatMulLowpNativeMMULKernelGunes Bayir
The skeleton code consists of modifications - to build the library with the quantized matmul kernel - refactoring of some common utilities - empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t]) - some validation tests and skeleton for functional tests Resolves: COMPMID-6473 Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes in NECropResize to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template in_bounds_crop_window so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/crop/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I1953849153e672ff7938f54c877c7498117dcca4 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10282 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to ElementwiseOp to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I5ecfc8f6c0d84f92d80bec2cde6e7338794b9788 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10240 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Extend Neon ReshapeLayer validation testsAnitha Raj
- Add a test case with src and dst having same row size - Remove inline from has_holes() util function Related to COMPMID-6504 Change-Id: Iead1f17692dc57b66c5d9f01eed30169efaee0a5 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10190 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Remove legacy PostOps codeJakub Sujak
PostOps was the experimental interface for Dynamic Fusion. It is now replaced by the new Dynamic Fusion interface with code generation using the Compute Kernel Writer. Resolves: COMPMID-6190 Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-01Pool3d changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Removed src/cpu/kernels/pool3d/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I71e6a54a27fd8f04ae2a67231709aad723b09fa3 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-31Port ClTemplatePool2d to ckwAdnan AlSinan
- Fixes a bug when using FP16 constant in some cases. - Adds op_write_raw_code to handle some special cases. - Ports MxN pooling 2d layer into ckw. - Adds unary function 'negate' to ckw. - Updates pool2d validation tests to include store op. Resovles COMPMID-6263 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: If8c683761fead79bd519aef28cc65de78d3ec629 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10172 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Port Resize operator to CKWGunes Bayir
Use Compute Kernel Writer (CKW) to generate code for Resize operator in the Dynamic Fusion interface. Supports Nearest Neighbor and Bilinear interpolation methods. Resolves: COMPMID-6265 Change-Id: Ib0a5158bd4208123c84f6a1dc54f29d82fd55dcd Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10174 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Changes in roi_align to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template roi_align() so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/roialign/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: If78371479042725723cea6f6c65aac76d68a1c1d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10213 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-08-29NEFuseBatchNormalizationKernel reworkPablo Marquez Tello
* Enable fp16 in armv8a multi_isa builds * Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Ia51007f5e663b708071958bb94bfab4535e4b2f8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10191 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-07Port DirectConv2d to CKW backendJakub Sujak
Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation. Support is for FP16/FP32 only. Resolves: COMPMID-6259 Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-19Add support for input S64/U64 in CpuCastKernelPablo Marquez Tello
* The kernel now supports the following conversions: S64 -> F32 U64 -> F32 * Resolves MLCE-1089 Change-Id: I277cf58b78d919fde25947520d2056e1412c7f82 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9935 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-14Port ClTemplateCast into CkwAdnan AlSinan
Resolves COMPMID-6257 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I3e56ff1f1109924da02d0abd0354a3f1fa095ee7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9914 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Nikolaj Jensen <nikolaj.jensen@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-14Port ClTemplateActivation into CkwAdnan AlSinan
Resolves COMPMID-6256 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I48f6a9dfadefced20802bec1ab4ab843a9deba6e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9912 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-12Add compute kernel writer arguments exportViet-Hoa Do
* The information is extracted from the prototype argument registry. Partially resolves: COMPMID-6283 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ia6d69b7c2a2e411597e76a7e03b7c92199a16990 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9848 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-06Move CKW prototype to separate directoryViet-Hoa Do
Partially resolves: COMPMID-6283 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I7596e3dc357d6f0b9cbe66534523943a73c26d81 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9864 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-04Depthwise channel pre-multiplicationMichael Tyler
Resolves: COMPMID-6337 Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-04Add Kernel Writer driver code to dynamic fusionSiCong Li
* Partially port ElementwiseBinary component to ckw (broadcast not supported yet) * Port Store component to ckw * Move KernelArgumentsHelpers to ckw_driver/ as it's only used by the driver ckw_driver is a middle layer between dynamic fusion and Compute Kernel Writer (CKW). It consumes the fused kernel component stream produced by Dynamic Fusion and uses CKW to write the kernel code complete with all meta info needed by the runtime to enqueue the kernel. It consists of two parts: * Kernel writing: This resides in dynamic_fusion/sketch * Runtime utilities: This resides in dynamic_fusion/runtime The integration (separation between DF and CKW) occurs in two places: * Inside GpuCKWDriver global driver that coordinates how the final fused kernel code is assembled together alongwith other meta info needed by runtime. * Inside each instantiated IGpuCKWComponentDriver component driver that drives CKW to write component-specific code or do component-specific configurations Partially resolves: COMPMID-5792 COMPMID-6282 COMPMID-6260 COMPMID-6266 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib57a080a65fe8cfee1a8df1529fe572005a6d2f2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9847 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-27Compute kernel writer API and prototypeViet-Hoa Do
* Add the public API for compute kernel writer. * Use the prototype as the implementation of the public API. Resolves: COMPMID-5790 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I9d80e15325e1d953feb87c1f2eb61a587bb9ab5e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9814 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Add helpers to set CKW tensor components as OpenCL kernel argumentsJakub Sujak
* Define ckw::TensorStorage. The tensor storage represents the type of tensor memory object. * Add helper functions for setting the CKW TensorComponent and TensorStorage as OpenCL kernel arguments. * Refactor CL Image2D method for simpler image object creation. Resolves: COMPMID-5784 Change-Id: I2d37d06783c1dc55f3b5692b44eb49b151f2401c Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9807 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>