aboutsummaryrefslogtreecommitdiff
path: root/arm_compute
AgeCommit message (Collapse)Author
2023-08-17Fix various static check issuesViet-Hoa Do
Resolves: COMPMID-6495 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I916829222a6211fa096a833a2afc5fab5eb34ea4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10143 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Fix out-of-scope CLBufferMemoryRegion's buffer still in queue issueSiCong Li
When a CLBufferMemoryRegion is freed, it also frees its cl::Buffer object. At this point we need to flush the queue to ensure all prior commands that may use this buffer are completed before the buffer's deallocation. Previously a CommandQueue object is owned as a member inside CLBufferMemoryRegion. Whenever CLBufferMemoryRegion is freed it causes the queue to be released, which implicitly flushes the queue. Now we need to explicitly flush the queue, without the excessive releasing of the queue Resolves COMPMID-6492 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I799507bcff8526d1381cde53d7c6298684c6d3ee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10126 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Fix failure in MeanReduce layerViet-Hoa Do
Resolves: COMPMID-6423 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I9cec051a7d1a2956218f8a6d8263bd5424f6d389 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10072 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix ReduceMean validate issueViet-Hoa Do
Resolves: COMPMID-6406 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ic638616f4cb228673928815b759caee3d094780d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10043 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Port ElementwiseBinary to CKW part 2SiCong Li
* Add fp16 support * Implement broadcasting to elementwise binary * Implement kernel name and kernel config id * Always use explicit cast in ckw unary, binary and ternary elementwise functions. This is to address the accidental use of double literals, with other benefits. * Refactor TypeConverter for smaller includes Resolves COMPMID-6260 Change-Id: I26b726746f8c0dd7b5942ad379d56f4d7642d15f Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9999 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-19Add support for input S64/U64 in CpuCastKernelPablo Marquez Tello
* The kernel now supports the following conversions: S64 -> F32 U64 -> F32 * Resolves MLCE-1089 Change-Id: I277cf58b78d919fde25947520d2056e1412c7f82 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9935 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Added S64/U64 support for the input in CLCastPablo Marquez Tello
* Partially resolves MLCE-1089 Change-Id: Ie3d2fc2f755ae99cdb17b57cc90bb3f99a1843e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9909 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Fix excessive calls to clReleaseCommandQueueSiCong Li
Use the queue object from the singleton Scheduler to avoid repeatedly freeing cl::CommandQueue object on the stack. Resolves COMPMID-6021 Change-Id: I0baf5891a7974cf4c7efad1b13cc5f28e49a2745 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9896 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-05Rewrote CLArgMinMax for axis 0Pablo Marquez Tello
* Simpler implementation without stages for axis 0 * Removed considerable amount of code. Resolves COMPMID-6271 Change-Id: Ie8bcb2f0b55f87472f44b38872a23a922619a211 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9849 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Remove dependency on fp16 definitions from some core include filesMatthew Bentham
This significantly improves the compilation times for parts of the core library that just need a definition of float16_t rather than access to all of the fp16 intrinsics. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: I5da1c6b0df0dd87d1d17948cd2e9b7375874f455 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/529385 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9781 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-23Fix doxygen warningsramy.elgammal@arm.com
Resolves: COMPMID-6312 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I9f68ccd2edb8c4d03fec19e6b9c29609d4833342 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-16Add Fused Activation to OpenCL MatMulMohammed Suhail Munshi
- Added fused activation to MatMul function interface - Added fused activation to CL backend - Includes tests for supported Activation Functions in MatMul Resolves: [COMPMID-6192] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie103212b600b60699eaf6a6394d609e6e1f5aba6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/522465 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9714 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up Utils.h a bit to reduce unused code being included everywhereMatthew Bentham
Move some maths-related things from Utils.h to new Math.h header in utils/math. Move some routines used for Tensor shape validation to Validate.h Change-Id: I8ce89fe03ec3ae1b61d1a80c282b8b91eea0cfb3 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524783 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9743 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up arm_compute/core/Types.h a bitMatthew Bentham
Split some of the larger types with inlined code into their own header files, so that the implementation of them needn't be included everywhere. Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-12Add multi-sketch support for dynamic fusionViet-Hoa Do
* Tensors are owned by workload context instead of workload sketch so that they can be used by multiple sketches. * Add an integration test for multi-sketch case. Resolves: COMPMID-6148 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I37d0de5ac103fb2a85020aa1c26e49eb304f47b7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9706 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-12Refactor activation LUT computationPablo Marquez Tello
* Moving the code out of Types.h will help with the compilation time. * Added LUT support for all other activation functions. * Resolves COMPMID-6292 Change-Id: I1b5f0b21f03237447163276b8796b2aeb3fdd45c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-09Reorder destructor in srcDavid Svantesson
Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: Iaed0933d665bd98829be49b9df11653d4d74081c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9746 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-09Added int8 support in LeakyRelu/LUT kernel.Pablo Marquez Tello
* Resolves COMPMID-6292 Change-Id: I15a0ad1c298ff53dd111fda76ef70872aaadac3b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9740 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-31Fix number of dimensions changed after transposeViet-Hoa Do
Partially resolves: IVGCVSW-7759 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I08a3e34db2c3980709718c5a8fc8783dd467d0fe Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9705 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-10Remove inclusion of NEReorderKernel header from NEReorderLayerRamy Elgammal
Resolves: COMPMID-6235 Change-Id: I7a094a23244286090415ee2788632cfa7bd6c037 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9608 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-05-05Connect CLMatMul function to quantized kernels and resolve NE BatchMatMul ↵Jakub Sujak
int_8 failures * Adapt the CLMatMul function and ClMatMul operator to use quantized kernels. * Add function-level tests. Resolves: COMPMID-5929 and COMPMID-5811 Change-Id: I5348cdcf07b8074c138e04dfef0a73399377accd Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9575 Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-05Make NECast::validate take args by const pointerMatthew Bentham
Change-Id: I5d343e959942cb2ce48442d95d7c62aecd6a34d0 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9573 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-04Implement OpenCL MatMul heuristic for Arm® Mali™-G710Gian Marco Iodice
- Add heuristic for f32/f16 and int8 quantized data types - Include MatMul configuration selection in the CLMatMul operator Resolves COMPMID-5950, COMPMID-5957, COMPMID-5959, COMPMID-5925, COMPMID-5926, COMPMID-5927, COMPMID-5928 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Ic222148da0337b88d4d8c960e3b6ac31003d8bcb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9564 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Fix im2col for fast-maths mode with padding.Renato Arantes
Following the investigation proposed by ONCPUML-1193, padding is implemented in im2col when the input channel is not a multiple of blocks requested by the weight format. Partially resolves: ONCPUML-1193 Signed-off-by: Renato Arantes <renato.arantes@arm.com> Change-Id: I350c7a1b2dcae63f8d94f5b6f1f86e948eab1f09 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9508 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Guards to make NEReorder aarch64 onlyDavid Svantesson
Resolves COMPMID-6151 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I0e8c957f3460633c32ef57be0cdc44a53b8c3e88 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9553 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-02Fix fully connected and matmul mismatchesViet-Hoa Do
* There is an issue with quantized fully connected and matmul when the lower bound of bounded ReLU is negative. * Use int32_t for the calculation of min/max quantized value rather than PixelValue to avoid this issue. Partially resolves: COMPMID-5996 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I7b22e9d56a2441fc6a4c5c4e627f57d6e00d6ff1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9502 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Reorder addedDavid Svantesson
Adds Reorder kernel exposing blocking reorders from arm_gemm Resolves ONCPUML-1232 Change-Id: I42bf4166311fe1771565134d3ed7039fc8e30230 Signed-off-by: David Svantesson <david.svantesson@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9500 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Fix the gather layer indices checkViet-Hoa Do
* If the index is out-of-bound, both CPU and GPU implementations of the gather layer will output 0. Resolves: COMPMID-5964 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ib029b3acfb31452f2097c8c75448fb2697cfa332 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9487 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-27Avoid printing error message for each not found OpenCl libraryRamy Elgammal
- Only print can't load any opencl library if none of "libOpenCL.so", "libGLES_mali.so", "libmali.so" are found. - If any of them is found, then no need to print any error message. Resolves: COMPMID-5973 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I9d62bd33545bbbf54d69836a4dca58a6294bc479 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/511804 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9483 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Integrate multi-threaded pretranspose_B_arraySiCong Li
This is required for the case where rhs (B) is dynamic and needs to be pretransposed in every run. In a multi-threaded setting, this means the previously single-threaded pretranspose_B_array would become the bottleneck Resolves COMPMID-5896 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Id508c46992188a0f76a505152931d4955d04c16d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-19Add quantized support for CPU MatMulViet-Hoa Do
Resolves: COMPMID-5899 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I89d96e292c3492ba9b1900a3e5683f9dcd11dfc6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9440 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-13Implement MatMul Function and Operator with Floating Point support for CPUMohammed Suhail Munshi
- Implements MatMul function and operator for floating point datatype FP16/FP32 - Includes support for transposing dynamic tensors prior to matrix multiplication. - Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors) - Updates fixture to allow for testing fused activation in MatMul - Adds tests for matmul with and without fused activation Resolved: [COMPMID-5898] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-04Support dynamic weights for Fully Connected layers on GPUJakub Sujak
The fully connected function and operator running on GPU have been adapted to support dynamic weights. Dynamic weights require the reshape and data layout conversion of weight tensors at runtime in the prepare stage of the operator. The implementation for GPU is identical to the CPU implementation. This patch also deprecates the `are_weights_reshaped` option in Fully Connected. Resolves: COMPMID-5870 Change-Id: I28f967695879d82cc91a928d95308a4e0e52a597 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9403 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Implement MatMul FunctionRamy Elgammal
Resolves: COMPMID-5949 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Idd8cfe6ea94a14f0b23178f6781251b5f0955563 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9390 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Add Cropping to CLBatchToSpaceOmar Al Khatib
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance. - Add cropping support and cropping tests Resolves [COMPMID-5865] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-30Add cropping support to NEBatchToSpaceSiCong Li
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance - Add cropping support and cropping tests Resolves COMPMID-5918 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ifea0f5f7760ffd0f4d5d4f3a5ae8d14d0b98b790 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9378 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Fix GCC13 compiler errorsPablo Marquez Tello
* Removed namespace arm_compute::utils::requires to fix the build error ‘requires’ is a keyword in C++20 [-Wc++20-compat] * Added missing includes for cstdint.h * Resolves MLCE-1040 Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-27Fix BatchToSpaceFixtureSiCong Li
* Use a vector to represent the (static) block shape instead of an N-D Tensor. The previous use of ND Tensor as block shape was wrong, not adhering to the specification, and non-functional (only first dim was used anyway). * The fixture now accepts a static block shape, because the dynamic case is not properly implemented and will be deprecated for now. * Fix an assertion error in reference implementation. Partially resolves COMPMID-5918 Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-20Implement OpenCL MatMul for Lhs T Rhs T/NT FP32/16Gunes Bayir
- Implement opencl kernel for LHS transposed and RHS non-transposed - Implement opencl kernel for LHS transposed and RHS transposed - Add validation tests Resolves: COMPMID-5953, COMPMID-5955 Change-Id: I55589acbffe86c44e29807574975978a1ec09bad Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
- Implement ClNativeMatMulKernel class - Implement opencl kernel for LHS non-transposed and RHS non-transposed - Implement opencl kernel for LHS non-transposed and RHS transposed - Add test fixture and dataset for matmul - Implement transpose_tensor() for reference implementation to transpose high dimensional tensors Resolves: COMPMID-5944, COMPMID-5951 Co-authored-by: Gunes Bayir <gunes.bayir@arm.com> Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1d5b8978f41be27baddb3153ade880472141573f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-14Add CropInfo to BatchToSpace reference and fixtureSiCong Li
Partially resolves COMPMID-5918, COMPMID-5865 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib3b01e7dc1c944184a4c038045bf0469fbb9ff45 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9321 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-08Add support for arbitrary parameters for CPU GatherViet-Hoa Do
* The shape of input and indices tensors, and the gather axis can be any number, as long as these are valid and the output tensor doesn't have more dimensions than the library supports. * Update the reference code to be more generic and straightforward. * Add necessary test cases. Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Resolves: COMPMID-5919 Change-Id: Ic7e2032777aa97ecc147f61d5388528697508ab1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9199 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-07Add sigmoid and tanh for dynamic fusionViet-Hoa Do
* Add sigmoid and tanh activation functions for dynamic fusion. * Add corresponding tests, but both activation functions share the same fixture implementation. Resolves: COMPMID-5939 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0aae0eaa18b746ce89680d2773c66e09b0f854ce Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9257 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-03Add weights_info as optional input for NEDeconvolutionLayerAnnop Wongwathanarat
This is so that we can leverage fixed format kernel when using gemm convolution method. Partially resolves: [ONCPUML-1129] Change-Id: I61ffa74f5cd9d75579dbc1f9aa187371f855e932 Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9248 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-01Add support for kernel indices in MaxpoolAdnan AlSinan
- Add a max pooling implementation that returns kernel indices. - Add a parameter in pooling info object to pick kernel indices impl. - Add validation tests. Resolves: [ONCPUML-1187] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-28Add an option to use lowest for max-poolingAdnan AlSinan
- Add a parameter in PoolingLayerInfo class to pick which value to use as min for max-pooling. Resolves: [ONCPUML-1166] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I34e1cccc15176bbf31523c61e99f3188ddca23e1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8989 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-28Add MatMulInfo Information class to ComputeLibraryMohammed Suhail Munshi
Partially resolves : [COMPMID-5930] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Icb6c8a9d1b5a2c5c57e37cb7c877414ed500d0cc Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/494758 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sicong Li <sicong.li@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9181 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add new operator AddMulAdd for Neon™ backend for Float/Quantized typesGunes Bayir
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types. The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel). The inputs are - input1 : nD tensor [X, Y, Z, W, ..] - input2 : nD tensor [X, Y, Z, W, ..] - add_coef : 1D tensor [X] - mul_coef : 1D tensor [X] The outputs are - out1 : nD tensor (intermediate output) [X, Y, Z, W, ..] - out2 : nD tensor (final output) [X, Y, Z, W, ..] The operation can be summarized as follows: out1 <- input1 + input2 out2 <- Act(out1 * mul_coef + add_coef) The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr. The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward. Resolves: COMPMID-5463 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>