aboutsummaryrefslogtreecommitdiff
path: root/tests/validation/CL
AgeCommit message (Collapse)Author
2023-05-15Raise abs_tolerance number for CL/DirectConvolution3D fp16 testsRamy Elgammal
Resolves: COMPMID-6255 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I602bcbf58f78e7384789cd077cfc3b549d53463e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9637 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-05Connect CLMatMul function to quantized kernels and resolve NE BatchMatMul ↵Jakub Sujak
int_8 failures * Adapt the CLMatMul function and ClMatMul operator to use quantized kernels. * Add function-level tests. Resolves: COMPMID-5929 and COMPMID-5811 Change-Id: I5348cdcf07b8074c138e04dfef0a73399377accd Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9575 Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Support multi-dimensional indices in the CL Gather Layer up to ↵Omar Al Khatib
four-dimensional output tensors Resolves [COMPMID-5775] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I6f6c12ac08f0b0ad070ca5d715c531c2c3762c30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-27Add quantized CL MatMul kernel for LHS NT, RHS TJakub Sujak
Implement a native kernel for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED and with the MatMul attributes `adj_x = false, adj_y = true`. Resolves: COMPMID-5923 Change-Id: I477b2dd886edfe83beaba9efc7d6b05ed19f5da4 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9467 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-20Implement CL kernel for a native batched matmul Quantized - LHS transposed, ↵Omar Al Khatib
RHS transposed Resolves: [COMPMID-5924] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I9ba657737eb1e3a096c8341ad4ad311571f8edeb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9454 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-04-17Add quantized CL MatMul kernels for Lhs NT/T, Rhs NTGunes Bayir
Implement OpenCL kernels for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED. Quantized MatMul is supported with the following MatMul attributes: * adj_x = false, adj_y = false * adj_x = true, adj_y = false We consider native format kernels only. In other words, no reshaping of the operand matrices is done. Resolves: COMPMID-5921, COMPMID-5922 Change-Id: I99e0f68054a2bd635c60ec2641acc2e7ff398473 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9435 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-14Align naming convention of ClMatMulJakub Sujak
Ensure naming of MatMul on GPU conforms to the naming convention <backend><operator><config> i.e. ClMatMul operator with the backend ClMatMulNativeKernel. Resolves: COMPMID-6015 Change-Id: I021d235b023ad17fe97bd6913e6a50d0ba4b194e Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9443 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-13Implement MatMul Function and Operator with Floating Point support for CPUMohammed Suhail Munshi
- Implements MatMul function and operator for floating point datatype FP16/FP32 - Includes support for transposing dynamic tensors prior to matrix multiplication. - Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors) - Updates fixture to allow for testing fused activation in MatMul - Adds tests for matmul with and without fused activation Resolved: [COMPMID-5898] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-04Fix unused variable warning reported in nightly nuildRamy Elgammal
Resovles: COMPMID-6002 Change-Id: Ifc2b7c889679b21d7e58f533be9c865854e132ef Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9408 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-04Support dynamic weights for Fully Connected layers on GPUJakub Sujak
The fully connected function and operator running on GPU have been adapted to support dynamic weights. Dynamic weights require the reshape and data layout conversion of weight tensors at runtime in the prepare stage of the operator. The implementation for GPU is identical to the CPU implementation. This patch also deprecates the `are_weights_reshaped` option in Fully Connected. Resolves: COMPMID-5870 Change-Id: I28f967695879d82cc91a928d95308a4e0e52a597 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9403 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Implement MatMul FunctionRamy Elgammal
Resolves: COMPMID-5949 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Idd8cfe6ea94a14f0b23178f6781251b5f0955563 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9390 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Add Cropping to CLBatchToSpaceOmar Al Khatib
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance. - Add cropping support and cropping tests Resolves [COMPMID-5865] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs T kernelsRamy Elgammal
Resolves: COMPMID-5952, COMPMID-5956 Change-Id: Idbd14538e7660792254072fa9631a6f03966f89b Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9371 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs NT kernelsGunes Bayir
Resolves: COMPMID-5945, COMPMID-5954 Change-Id: I7b27021d21f8e08c4896f6b1f595a75125064f9e Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9356 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-21Add dynamic weights for CPU fully connected layerViet-Hoa Do
Resolves: COMPMID-5917 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-20Implement OpenCL MatMul for Lhs T Rhs T/NT FP32/16Gunes Bayir
- Implement opencl kernel for LHS transposed and RHS non-transposed - Implement opencl kernel for LHS transposed and RHS transposed - Add validation tests Resolves: COMPMID-5953, COMPMID-5955 Change-Id: I55589acbffe86c44e29807574975978a1ec09bad Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implementation of RSQRT for quantized int8Ramy Elgammal
Resolves: COMPMID-5863 Change-Id: I9ff67face62826c1d335a6b941e8516be39bdac8 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/488768 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9225 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
- Implement ClNativeMatMulKernel class - Implement opencl kernel for LHS non-transposed and RHS non-transposed - Implement opencl kernel for LHS non-transposed and RHS transposed - Add test fixture and dataset for matmul - Implement transpose_tensor() for reference implementation to transpose high dimensional tensors Resolves: COMPMID-5944, COMPMID-5951 Co-authored-by: Gunes Bayir <gunes.bayir@arm.com> Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1d5b8978f41be27baddb3153ade880472141573f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-01Add support for kernel indices in MaxpoolAdnan AlSinan
- Add a max pooling implementation that returns kernel indices. - Add a parameter in pooling info object to pick kernel indices impl. - Add validation tests. Resolves: [ONCPUML-1187] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-14Add absolute tolerance to f16 CLConv3D validation testsSiCong Li
This fixes faulty mismatch issues. In addition, this aligns with the methodology used by f32, as well as that of cpu f16 tests Resolves COMPMID-5897 Change-Id: Id4e2088a9fc5444265c69444cfa90961dd84047e Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9146 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add Subtraction operator to Dynamic Fusion interfaceRamy Elgammal
Partially-Resolves: COMPMID-5518 Change-Id: I8358784815bcac461d50e384fa7bc96f476d3983 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9045 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Sync tolerance number of dynamic fusion direct conv2d with the current librarySiCong Li
Add descriptions and pointers in the tests to document the differences in test coverage between dynamic fusion and the current library, and most importantly, why the differences. This will come in handy when we want to quickly check if all old tests have been migrated so that we can safely deprecate / remove them. Resolves COMPMID-5840 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ie6227098979e51d7921810288f594beac19bce6f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9043 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-17Fix ClGemm crashes on unsupported data typesSiCong Li
Resolves COMPMID-5814 Change-Id: I09b206374cf3844c09aebd3c664daec9c2335e6d Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8953 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Fix CL DirectConvolutionLayer validate testsSiCong Li
* Add missing activation infos * Remove faulty test "Shrink window" * Split the tests based on data layout * Fix ClDirectConv2dKernel::validate logic Fused activation in NCHW is not supported at all Resolves: COMPMID-5801 Change-Id: I64dfbd24b77bb02fb4a88b73d5ef84676d85b4fd Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8899 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-12Redistribute CL/Deconvolution QSYMM8_PER_CHANNEL testsGunes Bayir
This patch significantly reduces the pressure of CL Deconvolution tests for QSYMM8_PER_CHANNEL on the precommit and migrates some of them to nightly, while adding smaller tests for precommit. Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I3ba16cb3ebc11b5f6015f97423b0496ee2449cc7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8782 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
- Implement indirect convolution kernel - Add operator support - Add test Resolves COMPMID-5709 Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-25Implement address precalculation for indirect conv2d - OpenCLGian Marco Iodice
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel) - Implement OpenCL kernel (indirect_convolution.cl) - Add test Resolves COMPMID-5708 Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-22Remove dynamic fusion prototype with tests and examplesSiCong Li
Public headers of the new experimental dynamic fusion can be found in arm_compute/dynamic_fusion/ New examples on how to use the interface can be found in tests/validation/dynamic_fusion/gpu/Integration.cpp Resolves COMPMID-5683 Change-Id: I7ccb902a227fb487562df15fc3c30118d1d95bbd Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8671 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix GemmLowp BatchMatMul Tests to use quantized OutputsMohammed Suhail Munshi
- Fix includes int8/uint8 quantized inputs - Bias S32 value is limited to better allow detection of mismatches in gemmlowp kernel Resolves: [COMPMID-5659] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie9cca430c6ab66253fe1d5252bd2c5396c7f38cf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8514 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint. Resolves: COMPMID-5676 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-03Fix activation block in gemm.clGian Marco Iodice
- Replace VEC_SIZE with N0. VEC_SIZE was used in the old gemm kernel and not used anymore in the existing ones Resolves COMPMID-5678 Change-Id: Ia770200b9d6e24c51c57347e4634fb8eadd10385 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8556 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-02Add Dynamic Fusion GpuConv2d FP32/FP16 testcaseRamy Elgammal
Resolves: COMPMID-5511 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I0ac0acbf1de7da09f18f7b457307ec3cc99deb3b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8546 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-21Add test for ClGemmLowpMatrixMultiplyCore to test a batched matrix ↵Ramy Elgammal
multiplication with variable input tensors Resolves: COMPMID-5506 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I8345a3b7a83ef46f9ec7a77197cc65c933ec9ac6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8239 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-16Fix validation in validate_image2d_support_on_rhsGian Marco Iodice
Resolves COMPMID-5533 Change-Id: Ice3d9469c7486a700c58fb61fc692b13f368d202 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8148 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Fix invalid memory access for dynamically fused Cl Elementwise kernelsSiCong Li
The M0 and N0 were incorrectly set for the case of broadcasting when the elementwise component is non-root. This is because we previously always use rhs tensor to derive the load M0, N0. But for non-root components, the addend/divisor tensor can be in the lhs or rhs. Thus this would fail in case the addend/divisor is in the lhs. - Also fixes broken Dynamic Fusion test Resolves COMPMID-5482 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I37f27ffa392781387db15739b1666f1dad28c554 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/445890 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8111 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Adding GELU activationMurray Kornelsen
OpenCL implementation uses built in erf. NEON implementation requires new vectorized erf. Uses the following approximation: erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4 a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108 From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Add test for CLGEMM to test a batched matrix multiplication with variable ↵Mohammed Suhail Munshi
input tensors Resolves : [COMPMID-5502] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ida001dc597973f9180468737a3e32e5022e6baee Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/450342 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8224 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-03Add Dynamic Fusion Tests with BugFixesMohammed Suhail Munshi
- Allow fusing arbitrary number of existing elementwise operators - Fix issues with 3D and 4D tensors in Elementwise Addition and Floor components - Collapse the 3D/4D window in the same way as that used by Conv2d, i.e. collapse dim 1 and dim 2 together - Fix Floor component issues when used after other components - Add Dynamic Fusion Tests (Floor + Div, Conv2d + Add + Div) - Add Addition ElementWise Broadcasting Test Resolves: [COMPMID-5356] Change-Id: I58b93a90175bb0440d43531d18cac94b5f5c2689 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/433956 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7957 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-22Add GemmLowp MMUL Reshaped Only Rhs Support for QASYMM8/QASYMM8_SIGNEDFreddie Liardet
This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5398 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-13Add Gemm MMUL Reshaped Only Rhs Support for FP32/FP16Gunes Bayir
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5216 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-27Implement new Elementwise Dynamic Fusion Operators: Div, FloorMichalis Spyrou
Resolves: COMPMID-5355 Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-05-11Fix inclusion guard for dynamic fusion moduleSiCong Li
Resolves COMPMID-5318 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I59594632c9891b9569089764ae26cc7be6b78fcd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7550 Reviewed-by: Nikhil Raj Arm <nikhil.raj@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-05-06Integrate Dynamic Fusion patchesSiCong Li
* Add public interfaces: * OperatorGraph: Describe a workload that could contain fused kernels * IWorkload: Generic interface for workloads built from OperatorGraph * ClWorkload: OpenCL workloads built from OperatorGraph * ClCompositeOperator: Runtime async operator to execute a ClWorkload * DependencyGraph (will likely be deprecated in later iterations) * Add example * cl_fused_conv2d_elementwise_add.cpp to explain how to use the new interfaces * Add internal translation layer * Refactor ClKernelBuildingAPI * Remove non-tile based gemm native kernel component * Minor interface changes * Add integration tests Resolves COMPMID-5161 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib987ed79289ab0bcbd3130d54f5793408d9f1240 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7510 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-04-19Add CLPool3d Int8 SupportMohammed Suhail Munshi
- Adds Qasymm8 and Qasymm8_signed support to the 3d pool operator Resolves: COMPMID-4669 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I36038c2b7c4f36baf67f7aae801356890e104538 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/410496 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7391 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-04-13Add DirectConvolution2D kernel component for dynamic fusionGunes Bayir
Resolves: COMPMID-5156 Change-Id: I438da924cb80d3bce72106b06ca7181e0606bd01 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7399 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-03-24[Dynamic Fusion] Implement build options generationGiorgio Arena
Resolves: COMPMID-5153 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ic34cc1f0d092fafa7c2faa4dd705cf8f68eaf87e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7317 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-03-17Setup automatic kernel window for dynamic fusionGunes Bayir
The window of the fused kernels is equal to the root kernel's window. For this reason, the following kernels does not change the window and intermediate tensors have the same shape and info as the destination tensor. Resolves: COMPMID-5152 Change-Id: I25fe2fab8304ecaabfc2e4ade9bbf31a600a5033 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7316 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-03-15Implementation of ClPooling3dramelg01
- For NDHWC layout - For F16 and F32 data types - Mixed Precision stil not supported Resolves: COMPMID-4670 Signed-off-by: ramy.elgammal@arm.com Change-Id: I0e14a13e4625569e8e5ee67e6033bd1efe0da469 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7262 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-03-08Merge kernel prototype patchGiorgio Arena
Resolves: COMPMID-5151 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ic4024d5cd4819fe917a1d49621f1866ae2e90a37 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7260 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>