aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-09-27Enable job-chaining with incremental job_chaining_size.Anitha Raj
Resolves COMPMID-6458 Change-Id: I1068da3dee6b6f58e4179f5a92521a6d6457e6c4 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10380 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-26Re-arrange header inclusion orderFelix Thomasmathibalan
Inclusion order of header is changed as preparatory step for applying clang-format Change-Id: I0c529f896ba802dfc6f30a573cdc9d9a24f3081c Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10379 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-26Select changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template select_op() which had to be moved from impl.cpp to fp16.cpp * Partially resolves MLCE-1102 Change-Id: Ic9e73e121482fcc5e4fcbe8ae1ecd23649cbd3d1 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10359 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-09-26Maxunpooling changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template max_unpooling() which had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Iabf9a9ba9d2441032f931f33aad97acc3e332575 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10362 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-09-25Remove CommonGraphOptions from Utils target and warningsPaolo Tricerri
Signed-off-by: Paolo Tricerri <paolo.tricerri@arm.com> Change-Id: If4e2944e25e48c8b7a1a6713e57838d449a987ea Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10366 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-21Optimize the main loop in mat_mul_native_quantized_mmul_nt_ntGunes Bayir
Switch the loop unrolling order and reuse the pre-computed vectors Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I636c0530d6b21dae4dbb371c57d18b1f7c7246a8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10355 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-21L2Norm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template l2_normalize_x() and l2_normalize_yz which had to be moved from impl.cpp to impl.h * Removed impl.cpp * Partially resolves MLCE-1102 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Id00a823730108293fc712295a178dad80588af30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10344 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-21Gemm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the templates vector_matrix_multiply_f16() and matrix_matrix_multiply_f16 which had to be moved from impl.cpp to fp16.cpp * Partially resolves MLCE-1102 Change-Id: Ic87440797d6f1653c815ab6565972206f5afd0ad Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10345 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-20Fix the validation issue in AddMulAdd fused kernelGunes Bayir
Resolves: COMPMID-6558 Change-Id: I015d504aaa9b8a1a232b01e49ab373d415ea1de9 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10340 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-18Add CL command buffer classViet-Hoa Do
* Two implementations of the command buffer are added: - CLMutableCommandBuffer uses mutable dispatch command buffer extension. - CLCompatCommandBuffer is the compatibility class for platform without the CL extension. Resolves: COMPMID-6454 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I15b370a50168ca940bd8fb2b5fae26230da3f472 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-18Implement Quantized MatMul kernel using MMUL extensionGunes Bayir
Resolves: COMPMID-6475 Change-Id: Ic867cdfff5d4391cb749a04bf7cc35cda63d3b71 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10311 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Remove deprecated support for BF16 in CpuCastAdnan AlSinan
Resolves : [COMPMID-6212] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-09-15GenerateProposals changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template compute_all_anchors() that had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Iaff6da32d0b9789ef87ba3f95bef99343612bd01 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10309 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Fuse batch normalization changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template fused_batch_normalization_dwc_nhwc() that had to be moved from impl.cpp to impl.h * Removed impl.cpp * Partially resolves MLCE-1102 Change-Id: Idaaa113c71729e32e565acf5fb5694c76c36d76d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10308 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Fix include dependencies for mass reformatting patchGunes Bayir
This patch fixes some include dependencies in certain files that caused build failures in https://review.mlplatform.org/c/ml/ComputeLibrary/+/10287. It also circumvents some clang-format glitches. Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8e9d3307edd2d1afd17c685c9bc9429624130e5a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10313 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-14Add skeleton of ClMatMulLowpNativeMMULKernelGunes Bayir
The skeleton code consists of modifications - to build the library with the quantized matmul kernel - refactoring of some common utilities - empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t]) - some validation tests and skeleton for functional tests Resolves: COMPMID-6473 Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Softmax changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: I2e5e68fbcf5279de1ffc1be4def4f96ed05593e9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10224 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes to InstanceNrom to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: If53ff1927948b3ad7c9e3c9347bc2af38764e342 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10243 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes in NECropResize to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template in_bounds_crop_window so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/crop/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I1953849153e672ff7938f54c877c7498117dcca4 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10282 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-08Meanstddevnorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I7e6d998e427982d4a037dbce6d17ca378665e07f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10241 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to BoundingBoxTransform to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I04822b043d9f87bc666750a8d95a8be8a6cc194d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10239 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to ElementwiseOp to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I5ecfc8f6c0d84f92d80bec2cde6e7338794b9788 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10240 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Extend Neon ReshapeLayer validation testsAnitha Raj
- Add a test case with src and dst having same row size - Remove inline from has_holes() util function Related to COMPMID-6504 Change-Id: Iead1f17692dc57b66c5d9f01eed30169efaee0a5 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10190 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Remove legacy PostOps codeJakub Sujak
PostOps was the experimental interface for Dynamic Fusion. It is now replaced by the new Dynamic Fusion interface with code generation using the Compute Kernel Writer. Resolves: COMPMID-6190 Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-04DWC changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template run_depthwise_float() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: I428a79c4ab3a990331f20f5bd6b9fea88b0836b9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10218 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-01Pool3d changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Removed src/cpu/kernels/pool3d/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I71e6a54a27fd8f04ae2a67231709aad723b09fa3 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-31Port ClTemplatePool2d to ckwAdnan AlSinan
- Fixes a bug when using FP16 constant in some cases. - Adds op_write_raw_code to handle some special cases. - Ports MxN pooling 2d layer into ckw. - Adds unary function 'negate' to ckw. - Updates pool2d validation tests to include store op. Resovles COMPMID-6263 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: If8c683761fead79bd519aef28cc65de78d3ec629 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10172 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Port Resize operator to CKWGunes Bayir
Use Compute Kernel Writer (CKW) to generate code for Resize operator in the Dynamic Fusion interface. Supports Nearest Neighbor and Bilinear interpolation methods. Resolves: COMPMID-6265 Change-Id: Ib0a5158bd4208123c84f6a1dc54f29d82fd55dcd Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10174 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Changes in roi_align to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template roi_align() so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/roialign/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: If78371479042725723cea6f6c65aac76d68a1c1d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10213 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-08-29GEMM: AArch32: Split assembler block in a32_merge_float_8x6.hppDavid Mansell
Inline assembler blocks attempting to bind 8 integer registers don't compile in certain configurations (notably GCC 13.2 debug builds with -O0 -g). Fix this by splitting the offending block into two separate parts (straightforward as there is no flow control in the block). Fixes: COMPMID-6532 Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: I80e9a10e6a91574176d50e63c45fab055aefa659 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10197 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Emanuele Rocca <ema@linux.it> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-29NEFuseBatchNormalizationKernel reworkPablo Marquez Tello
* Enable fp16 in armv8a multi_isa builds * Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Ia51007f5e663b708071958bb94bfab4535e4b2f8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10191 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-23CpuAdd rework to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h Change-Id: I9e64a3101958fcb9c3d5c8e9b148b498b2bee05f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10154 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-23Update CpuGemmConv2d and CpuFlatten to use CpuReshape operatorAnitha Raj
- Following CpuReshapeKernel Optimizations, update the CpuGemmConv2D and CpuFlatten to use CpuReshape operator instead of CpuReshapeKernel - Minor changes to comment in NEReorgLayerKernel.h Resolves COMPMID-6504 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: Ib6ee1fdc313d91249f9fe41c81e73324031c1ff4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10186 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22CPU: Depthwise: Generate correct size for input indirection array.David Mansell
Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: I359ed0703f4036e017b34b622f76b630cefac973 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10183 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22Optimize CpuReshapeKernelAnitha Raj
Resolves COMPMID-5279 Change-Id: Id9b007eed62c200702bbfcc83b94dab7b5de1714 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9962 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-17Fix depthwise convolution not using assembly kernelViet-Hoa Do
* Take dilation into account when checking padding. Resolves: COMPMID-6348 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I897a13ba7f37382733c35c1701d1ec310ed55331 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10147 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-17Fix various static check issuesViet-Hoa Do
Resolves: COMPMID-6495 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I916829222a6211fa096a833a2afc5fab5eb34ea4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10143 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Check CL command buffer extensionViet-Hoa Do
* Add helper functions to check whether command buffer extensions exist in CL device. Resolves: COMPMID-6453 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibc287e4526e54be4702241ab8ca0cea0b8661b3a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10130 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Fix out-of-scope CLBufferMemoryRegion's buffer still in queue issueSiCong Li
When a CLBufferMemoryRegion is freed, it also frees its cl::Buffer object. At this point we need to flush the queue to ensure all prior commands that may use this buffer are completed before the buffer's deallocation. Previously a CommandQueue object is owned as a member inside CLBufferMemoryRegion. Whenever CLBufferMemoryRegion is freed it causes the queue to be released, which implicitly flushes the queue. Now we need to explicitly flush the queue, without the excessive releasing of the queue Resolves COMPMID-6492 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I799507bcff8526d1381cde53d7c6298684c6d3ee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10126 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-14Optimize CLReduce for Min/Max Axis=0Gunes Bayir
Resolves: COMPMID-6400 Change-Id: Id9935f9727f77a824afc75c35f044e3f5c173e0d Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10120 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Add support for S64 output in NEArgMinMaxLayerPablo Marquez Tello
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32 * We need to call NECast to convert from S32 to S64 * Resolves MLCE-1089 Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Setup pre-commit and include code formatting scriptsGunes Bayir
This patch - includes our code formatting scripts used in our precommit pipeline - sets up pre-commit framework to help contributors validate their patches This has several benefits: - our repository becomes more inclusive to external contributions - we can use several hooks available online efficiently, w/o implementing our own - it speeds up our development flow and, it is completely optional. The pre-commit configuration includes running the following: - our code formatting scripts - CMake and Bazel build file generation scripts - hooks that check trailing whitespace, end of files, committed large files etc. The number of checks can easily be extended using pre-commit framework. Change-Id: I06bf1259715579d321f89820726a8301504c36d9 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10064 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Avoid using CLMatMul in CLFullyConnected when GPUTarget is Midgardramy.elgammal@arm.com
Resolves: COMPMID-6428 Change-Id: I255a46e894bafe1d3a26f0aebb828911bbfd750b Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10070 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-07Port DirectConv2d to CKW backendJakub Sujak
Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation. Support is for FP16/FP32 only. Resolves: COMPMID-6259 Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-07Document the Conv2D heuristicGian Marco Iodice
- Add a new section in the documentation to describe how the conv2D heuristic works on Arm® Cortex®-based CPUs and Arm® Mali™-based GPUs - Add CKW_UNUSED in compute_kernel_writer/src/cl/CLTile.cpp to avoid the compilation error due to an unused variable - Remove FFT from the list of algorithms to be selected by the CPU Conv2d heuristic. Resolves COMPMID-6163 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I51384d7749451b2562642683e8b2429a355166bb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10065 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix ReduceMean validate issueViet-Hoa Do
Resolves: COMPMID-6406 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ic638616f4cb228673928815b759caee3d094780d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10043 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix CL Tile operatorViet-Hoa Do
Resolves: COMPMID-6404 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I75aebe620567ed50817747589bbe8cfb63715a7b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10036 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Port ElementwiseBinary to CKW part 2SiCong Li
* Add fp16 support * Implement broadcasting to elementwise binary * Implement kernel name and kernel config id * Always use explicit cast in ckw unary, binary and ternary elementwise functions. This is to address the accidental use of double literals, with other benefits. * Refactor TypeConverter for smaller includes Resolves COMPMID-6260 Change-Id: I26b726746f8c0dd7b5942ad379d56f4d7642d15f Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9999 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-25Add GpuKernelArgumentBinding for runtime argument settingSiCong Li
* Add flexible runtime argument setting that accept argument bindings exported from ckw. * Introduce internal build flag ACL_INTERNAL_TEST_CKW_IN_DF. If set to true, ckw will be tested in dynamic fusion validation tests. Otherwise it will not be tested and the dynamic fusion will keep using ClTemplateWriter instead. * Fix CKW sampler for elementwise binary to deal with tile sizes > 1 in both dimensions Resolves: COMPMID-6282 Partially resolves: COMPMID-6260 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I0ab225a4484eb2119643d900a4e72806558626ee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9917 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>