aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-09-14Add skeleton of ClMatMulLowpNativeMMULKernelGunes Bayir
The skeleton code consists of modifications - to build the library with the quantized matmul kernel - refactoring of some common utilities - empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t]) - some validation tests and skeleton for functional tests Resolves: COMPMID-6473 Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Softmax changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: I2e5e68fbcf5279de1ffc1be4def4f96ed05593e9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10224 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes to InstanceNrom to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: If53ff1927948b3ad7c9e3c9347bc2af38764e342 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10243 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-13Changes in NECropResize to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template in_bounds_crop_window so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/crop/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I1953849153e672ff7938f54c877c7498117dcca4 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10282 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-08Meanstddevnorm changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I7e6d998e427982d4a037dbce6d17ca378665e07f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10241 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to BoundingBoxTransform to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I04822b043d9f87bc666750a8d95a8be8a6cc194d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10239 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-06Changes to ElementwiseOp to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * Partially resolves MLCE-1102 Change-Id: I5ecfc8f6c0d84f92d80bec2cde6e7338794b9788 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10240 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Extend Neon ReshapeLayer validation testsAnitha Raj
- Add a test case with src and dst having same row size - Remove inline from has_holes() util function Related to COMPMID-6504 Change-Id: Iead1f17692dc57b66c5d9f01eed30169efaee0a5 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10190 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Remove legacy PostOps codeJakub Sujak
PostOps was the experimental interface for Dynamic Fusion. It is now replaced by the new Dynamic Fusion interface with code generation using the Compute Kernel Writer. Resolves: COMPMID-6190 Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-04DWC changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template run_depthwise_float() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: I428a79c4ab3a990331f20f5bd6b9fea88b0836b9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10218 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-01Pool3d changes to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use various templates that had to be moved from impl.cpp to impl.h * Removed src/cpu/kernels/pool3d/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: I71e6a54a27fd8f04ae2a67231709aad723b09fa3 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-31Port ClTemplatePool2d to ckwAdnan AlSinan
- Fixes a bug when using FP16 constant in some cases. - Adds op_write_raw_code to handle some special cases. - Ports MxN pooling 2d layer into ckw. - Adds unary function 'negate' to ckw. - Updates pool2d validation tests to include store op. Resovles COMPMID-6263 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: If8c683761fead79bd519aef28cc65de78d3ec629 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10172 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Port Resize operator to CKWGunes Bayir
Use Compute Kernel Writer (CKW) to generate code for Resize operator in the Dynamic Fusion interface. Supports Nearest Neighbor and Bilinear interpolation methods. Resolves: COMPMID-6265 Change-Id: Ib0a5158bd4208123c84f6a1dc54f29d82fd55dcd Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10174 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Changes in roi_align to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template roi_align() so it had to be moved from impl.cpp to impl.h * Removed the file src/cpu/kernels/roialign/generic/neon/impl.cpp * Partially resolves MLCE-1102 Change-Id: If78371479042725723cea6f6c65aac76d68a1c1d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10213 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-08-29GEMM: AArch32: Split assembler block in a32_merge_float_8x6.hppDavid Mansell
Inline assembler blocks attempting to bind 8 integer registers don't compile in certain configurations (notably GCC 13.2 debug builds with -O0 -g). Fix this by splitting the offending block into two separate parts (straightforward as there is no flow control in the block). Fixes: COMPMID-6532 Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: I80e9a10e6a91574176d50e63c45fab055aefa659 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10197 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Emanuele Rocca <ema@linux.it> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-29NEFuseBatchNormalizationKernel reworkPablo Marquez Tello
* Enable fp16 in armv8a multi_isa builds * Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h * Partially resolves MLCE-1102 Change-Id: Ia51007f5e663b708071958bb94bfab4535e4b2f8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10191 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-23CpuAdd rework to enable fp16 in armv8a multi_isa buildsPablo Marquez Tello
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs to be moved to an fp16.cpp file to allow compilation with -march=armv8.2-a+fp16 * fp16.cpp needs to use the template add_same_neon() so it had to be moved from impl.cpp to impl.h Change-Id: I9e64a3101958fcb9c3d5c8e9b148b498b2bee05f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10154 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-23Update CpuGemmConv2d and CpuFlatten to use CpuReshape operatorAnitha Raj
- Following CpuReshapeKernel Optimizations, update the CpuGemmConv2D and CpuFlatten to use CpuReshape operator instead of CpuReshapeKernel - Minor changes to comment in NEReorgLayerKernel.h Resolves COMPMID-6504 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: Ib6ee1fdc313d91249f9fe41c81e73324031c1ff4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10186 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22CPU: Depthwise: Generate correct size for input indirection array.David Mansell
Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: I359ed0703f4036e017b34b622f76b630cefac973 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10183 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22Optimize CpuReshapeKernelAnitha Raj
Resolves COMPMID-5279 Change-Id: Id9b007eed62c200702bbfcc83b94dab7b5de1714 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9962 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-17Fix depthwise convolution not using assembly kernelViet-Hoa Do
* Take dilation into account when checking padding. Resolves: COMPMID-6348 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I897a13ba7f37382733c35c1701d1ec310ed55331 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10147 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-17Fix various static check issuesViet-Hoa Do
Resolves: COMPMID-6495 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I916829222a6211fa096a833a2afc5fab5eb34ea4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10143 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Check CL command buffer extensionViet-Hoa Do
* Add helper functions to check whether command buffer extensions exist in CL device. Resolves: COMPMID-6453 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibc287e4526e54be4702241ab8ca0cea0b8661b3a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10130 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Fix out-of-scope CLBufferMemoryRegion's buffer still in queue issueSiCong Li
When a CLBufferMemoryRegion is freed, it also frees its cl::Buffer object. At this point we need to flush the queue to ensure all prior commands that may use this buffer are completed before the buffer's deallocation. Previously a CommandQueue object is owned as a member inside CLBufferMemoryRegion. Whenever CLBufferMemoryRegion is freed it causes the queue to be released, which implicitly flushes the queue. Now we need to explicitly flush the queue, without the excessive releasing of the queue Resolves COMPMID-6492 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I799507bcff8526d1381cde53d7c6298684c6d3ee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10126 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-14Optimize CLReduce for Min/Max Axis=0Gunes Bayir
Resolves: COMPMID-6400 Change-Id: Id9935f9727f77a824afc75c35f044e3f5c173e0d Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10120 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Add support for S64 output in NEArgMinMaxLayerPablo Marquez Tello
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32 * We need to call NECast to convert from S32 to S64 * Resolves MLCE-1089 Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Setup pre-commit and include code formatting scriptsGunes Bayir
This patch - includes our code formatting scripts used in our precommit pipeline - sets up pre-commit framework to help contributors validate their patches This has several benefits: - our repository becomes more inclusive to external contributions - we can use several hooks available online efficiently, w/o implementing our own - it speeds up our development flow and, it is completely optional. The pre-commit configuration includes running the following: - our code formatting scripts - CMake and Bazel build file generation scripts - hooks that check trailing whitespace, end of files, committed large files etc. The number of checks can easily be extended using pre-commit framework. Change-Id: I06bf1259715579d321f89820726a8301504c36d9 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10064 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Avoid using CLMatMul in CLFullyConnected when GPUTarget is Midgardramy.elgammal@arm.com
Resolves: COMPMID-6428 Change-Id: I255a46e894bafe1d3a26f0aebb828911bbfd750b Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10070 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-07Port DirectConv2d to CKW backendJakub Sujak
Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation. Support is for FP16/FP32 only. Resolves: COMPMID-6259 Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-07Document the Conv2D heuristicGian Marco Iodice
- Add a new section in the documentation to describe how the conv2D heuristic works on Arm® Cortex®-based CPUs and Arm® Mali™-based GPUs - Add CKW_UNUSED in compute_kernel_writer/src/cl/CLTile.cpp to avoid the compilation error due to an unused variable - Remove FFT from the list of algorithms to be selected by the CPU Conv2d heuristic. Resolves COMPMID-6163 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I51384d7749451b2562642683e8b2429a355166bb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10065 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix ReduceMean validate issueViet-Hoa Do
Resolves: COMPMID-6406 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ic638616f4cb228673928815b759caee3d094780d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10043 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix CL Tile operatorViet-Hoa Do
Resolves: COMPMID-6404 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I75aebe620567ed50817747589bbe8cfb63715a7b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10036 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Port ElementwiseBinary to CKW part 2SiCong Li
* Add fp16 support * Implement broadcasting to elementwise binary * Implement kernel name and kernel config id * Always use explicit cast in ckw unary, binary and ternary elementwise functions. This is to address the accidental use of double literals, with other benefits. * Refactor TypeConverter for smaller includes Resolves COMPMID-6260 Change-Id: I26b726746f8c0dd7b5942ad379d56f4d7642d15f Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9999 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-25Add GpuKernelArgumentBinding for runtime argument settingSiCong Li
* Add flexible runtime argument setting that accept argument bindings exported from ckw. * Introduce internal build flag ACL_INTERNAL_TEST_CKW_IN_DF. If set to true, ckw will be tested in dynamic fusion validation tests. Otherwise it will not be tested and the dynamic fusion will keep using ClTemplateWriter instead. * Fix CKW sampler for elementwise binary to deal with tile sizes > 1 in both dimensions Resolves: COMPMID-6282 Partially resolves: COMPMID-6260 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I0ab225a4484eb2119643d900a4e72806558626ee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9917 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-25Fix problem with exception handling in CPPSchedulerMatthew Bentham
If an exception was thrown in the main thread, the child threads were not being synchronised, leading to undefined behaviour (and probably the program crashing instead of correctly propagating the exception). Add support to the build system for enabling ThreadSanitizer. This needs a build system option rather than simply passing extra_cxx/link_flags because the sanitizer options are incompatible with -Wl,-undefined,error. Add a unit test for throwing an exception within a CPPScheduler workload. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: I7638272a5d43a24a861f3e6d63f3ee7b099483b5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/538048 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9957 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-21Enable S64 output in CLArgMinMaxPablo Marquez Tello
Resolves MLCE-1089 Change-Id: I8b385ef8a00ec5de60299bc7a359766ba5417e68 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9918 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-07-20Fix failing CTS tests by disabling matmul when weights conversion is required.Mohammed Suhail Munshi
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ibba6564f111f493e4d7bac692eb2637830d4aff9 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9943 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-19Add support for input S64/U64 in CpuCastKernelPablo Marquez Tello
* The kernel now supports the following conversions: S64 -> F32 U64 -> F32 * Resolves MLCE-1089 Change-Id: I277cf58b78d919fde25947520d2056e1412c7f82 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9935 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-14Port ClTemplateCast into CkwAdnan AlSinan
Resolves COMPMID-6257 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I3e56ff1f1109924da02d0abd0354a3f1fa095ee7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9914 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Nikolaj Jensen <nikolaj.jensen@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-14Port ClTemplateActivation into CkwAdnan AlSinan
Resolves COMPMID-6256 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I48f6a9dfadefced20802bec1ab4ab843a9deba6e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9912 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Added S64/U64 support for the input in CLCastPablo Marquez Tello
* Partially resolves MLCE-1089 Change-Id: Ie3d2fc2f755ae99cdb17b57cc90bb3f99a1843e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9909 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Fix excessive calls to clReleaseCommandQueueSiCong Li
Use the queue object from the singleton Scheduler to avoid repeatedly freeing cl::CommandQueue object on the stack. Resolves COMPMID-6021 Change-Id: I0baf5891a7974cf4c7efad1b13cc5f28e49a2745 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9896 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Enable premultiplication for depthwise convolutionMichael Tyler
with fp16 and quantized types Resolves: COMPMID-6337 Change-Id: I81542e51c9c0329f202ac8452f173b138e51a0f6 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9883 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-12Add compute kernel writer arguments exportViet-Hoa Do
* The information is extracted from the prototype argument registry. Partially resolves: COMPMID-6283 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ia6d69b7c2a2e411597e76a7e03b7c92199a16990 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9848 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-11Add Bias to MatMul Kernels and add support for use in Fully Connected LayerMohammed Suhail Munshi
Resolves: [COMPMID-6316] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I08e6bac9e6b46b76978da0dc6a48ccfe3dde5086 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9833 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-10Port operations to CKW prototypeNikolaj Jensen
Resolves: COMPMID-6334 Signed-off-by: Nikolaj Jensen <nikolaj.jensen@arm.com> Change-Id: I500d30f09daec4087eb3e7aecd1de77dc8fd53b4 Signed-off-by: Nikolaj Jensen <nikolaj.jensen@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9828 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-10Disable kernel size 3 in argminmax for axis 0Pablo Marquez Tello
* Only kernel sizes supported are 2, 4, 8 and 16. * Resolves COMPMID-6349 Change-Id: I30c85dcb3505d47fe56a2d2a08e9221ff426ee93 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9890 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-10Do not include headers necessary for logging when logging is disabledMatthew Bentham
Speeds up compilation by 30% for some files when logging is disabled. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: Ia479bd50a80616a34e33ead13db8558f8dbaa1aa Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/534480 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9880 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>