aboutsummaryrefslogtreecommitdiff
path: root/tests
AgeCommit message (Collapse)Author
2022-12-30Add temporary tile support for dynamic fusionViet-Hoa Do
* Multiple intermediate tensors can share the same tile. - A simple operator can reuse the input tensor for the result if the input tensor has the same shape, data type and it is only consumed by that operator. - The special case is a simple operator and an output operator consume the same tensor. However as the output operator doesn't change the content of the input tensor, it doesn't count as "consuming" the input tensor. * These temporary tiles are declared automatically by the template writer. Individual operator doesn't need to generate output tile declaration. * Cast is now simple operator. Resolves: COMPMID-5778 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I232647ac976645e2d266a62e055b9eb48c356a8e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8877 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Fix various compilation errorsViet-Hoa Do
Partially resolves: COMPMID-5794 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I275d0401be978e86507990bdb7dc5b1538a108d8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8884 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Add Resize/Scale operator to Dynamic Fusion interfaceJakub Sujak
Resolves: COMPMID-5521 Change-Id: Id38a4ce18f9ea8805a151acb064e72795535d1a0 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8859 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-23Add multiple output support for dynamic fusionViet-Hoa Do
* The dependency graph now can schedule any acyclic graph into a sequential list of operators. This is needed as the output operators now form branches in the graph. * Fix the definition of input, output and intermediate tensors in GpuKernelComponentGroup to support non-linear but sequential list of operators. * Add constraint on GpuOperatorGroup to enforce strictly linear fusion style, but allow output operator as the only form of branch. Resolves: COMPMID-5771 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I68de3a31a2456145081f0a397e4e61dd66327682 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8823 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-16Add output operator for dynamic fusionViet-Hoa Do
* The output of the fused operator must be explicitly specified using GpuOutput operator. * Any temporary tensors used to connect the output of an operator to the input of another operator will be marked as no-alloc and won't be allocated as a tensor in the memory. Resolves: COMPMID-5771 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5ae8e800f8f737db23a055a92b01c4f1d78c3bb8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8794 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-13Add CLAMP operator to Dynamic Fusion interfaceJakub Sujak
Add the CLAMP activation function for GPU backend with generic activation Component and TemplateWriter modules. CLAMP is internally implemented as LU_BOUNDED_RELU activation function with the alpha and beta variables swapped. We do NOT consider in-place computation cases in this patch. * CLAMP operator for GPU backend. * Activation Component and TemplateWriter for CL backend. * TemplateWriter generates tiled kernel code. * Supported data types: F16, F32. * Validation tests for CLAMP operation. Resolves: COMPMID-5519 Change-Id: Ieb097d6b1e6a7ed2b882518e88314454efb402f6 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8762 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-12Redistribute CL/Deconvolution QSYMM8_PER_CHANNEL testsGunes Bayir
This patch significantly reduces the pressure of CL Deconvolution tests for QSYMM8_PER_CHANNEL on the precommit and migrates some of them to nightly, while adding smaller tests for precommit. Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I3ba16cb3ebc11b5f6015f97423b0496ee2449cc7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8782 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement Cast operator in dynamic fusionGunes Bayir
The operator is migrated into dynamic fusion for all data types supported Resolves: COMPMID-5693 Change-Id: I3c550d3d1cd04570f453beae678c3f60d4cb1a73 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8755 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
- Implement indirect convolution kernel - Add operator support - Add test Resolves COMPMID-5709 Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-29Adding GpuAdd to dynamic fusion operatorsRamy Elgammal
- Provide support for Add operator - Auto initialize the destination tensor before testing fusion in conv2d and elementwise binary ops. Resolves: COMPMID-5518 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ibd815020f02b57f88eea7c2921bdcf98605d99c5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8617 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-28Integrate SME2 kernelsViet-Hoa Do
* Add SME/SME2 detection. * Integrate SME2 implementation for: - Normal convolution - Winograd - Depthwise convolution - Pooling Resolves: COMPMID-5700 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2f1ca1d05f8cfeee9309ed1c0a36096a4a6aad5c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8692 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-28Implement FP32/16 Depthwise Conv2d operator in dynamic fusionGunes Bayir
This patch adds Depthwise Conv2d operator into dynamic fusion interface and adds the associated tests. Resolves: COMPMID-5517 Change-Id: I385c94dff7fd40c72b8337ef797e508df4499a82 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8678 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-25Implement address precalculation for indirect conv2d - OpenCLGian Marco Iodice
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel) - Implement OpenCL kernel (indirect_convolution.cl) - Add test Resolves COMPMID-5708 Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-22Remove dynamic fusion prototype with tests and examplesSiCong Li
Public headers of the new experimental dynamic fusion can be found in arm_compute/dynamic_fusion/ New examples on how to use the interface can be found in tests/validation/dynamic_fusion/gpu/Integration.cpp Resolves COMPMID-5683 Change-Id: I7ccb902a227fb487562df15fc3c30118d1d95bbd Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8671 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-22space-to-depth shape calculator fixRamy Elgammal
Resolves: ARMCL-592 Change-Id: If036a2b6d2842db2ed750e9d350bfaa3f7a76c9e Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8668 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix GemmLowp BatchMatMul Tests to use quantized OutputsMohammed Suhail Munshi
- Fix includes int8/uint8 quantized inputs - Bias S32 value is limited to better allow detection of mismatches in gemmlowp kernel Resolves: [COMPMID-5659] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie9cca430c6ab66253fe1d5252bd2c5396c7f38cf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8514 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint. Resolves: COMPMID-5676 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-08SVE Hard-Swish via Lookup table for quantized inputPablo Marquez Tello
* Datatype supported qasymm8 and qasymm8_signed. * Resolves COMPMID-5211 Change-Id: Ib161af3af5ad11ec6b46de0bf4fbac172d2525fb Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7729 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-04Fix compiler warnings in dynamic fusionSiCong Li
Resolves: COMPMID-5686 Change-Id: I608c359583c44f2f04f29faddd1c6b38a381de60 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8562 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-03Fix activation block in gemm.clGian Marco Iodice
- Replace VEC_SIZE with N0. VEC_SIZE was used in the old gemm kernel and not used anymore in the existing ones Resolves COMPMID-5678 Change-Id: Ia770200b9d6e24c51c57347e4634fb8eadd10385 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8556 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-02Add Dynamic Fusion GpuConv2d FP32/FP16 testcaseRamy Elgammal
Resolves: COMPMID-5511 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I0ac0acbf1de7da09f18f7b457307ec3cc99deb3b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8546 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-02Partially Revert "Add threshold for floating-point SOFT_RELU activation"Gunes Bayir
Revert the range removal in tests for soft relu and bring the former implementation in CL backend back. Resolves: COMPMID-5677 Change-Id: I35d5ac03a134299041ce97aabc9fff2d4380d09f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8551 Reviewed-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Updateable weights in depthwise convolutionMilos Puzovic
Check whether weights are defined as constant, if they are not constant then do not pack them if they are already packed so that they can be updated. Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I73447e31e3660b05f8f40e04ea4ea2003eb9b802 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8539 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Add threshold for floating-point SOFT_RELU activationMilos Puzovic
Added missing threshold for calculating SOFT_RELU when SVE and CL implementations are used. As a result removed from the testing bounds for input values that were set to be in the interval [-40, 40]. Resolves: COMPMID-5658 Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I3d14df60125e36e4eb85aeb222f4fb0cc5741521 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8536 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rewrite dynamic fusionSiCong Li
The new version introduces the following major changes: * Change public interface to simplify and standardize the user experience - Use the term "Workload" uniformly - Simplify operator interface to be a set of static methods: validate_op(), create_op() * Separate the kernel writing into its own component (template_writer). This is to allow the co-development of GpuKernelWriter, and to allow easy replacement once GpuKernelWriter is mature. * Optimize the core fusion algorithm used by the component graph. The details can be found in GpuKernelComponentGraph::fuse() * Use Gpu instead of Cl prefixes for most of the Workload interfaces (except for runtime and kernel components, which have to be language specific) This allows the potential extension to other Gpu langauges in the future. * Refactor runtime memory interface so that auxiliary tensor handling is separate from the user tensor passing. This is because the former is less stable and may require extension in the future. * Hide source code object from the user as it is not required at the moment * Deprecate the old prototype entirely by disabling it in SCons build Resolves COMPMID-5510, COMPMID-5512, COMPMID-5513 Change-Id: If69d2362856f2de4503546b7b6cf48a525cf3079 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8406 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-20Update reinterpret tensor as 1D for CPU addViet-Hoa Do
* Use the same implementation as other layers. Resolves: COMPMID-5108 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5a50259b398b71ca1f61b5ee8daa539bf8263fac Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8501 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-20Add test in GEMMLowp for batch matmulMohammed Suhail Munshi
- Adds tests for batched matrix multiplication - Bugfix for issue : 3d tensors input tensors with offsets in GemmLowp results in mismatches Resolves : [COMPMID-5507] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I68e036fbca642c1841dd4321033045aadc8f5636 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/461298 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8482 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-19Fix FFTConvolutionLayer testViet-Hoa Do
* Do not change the tensor info after configure stage. * By fixing this, the 1D optimization for activation layer can be applied to all data types and tensor layout. Resolves: COMPMID-5644 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I557f9bb84e5e456c28d6b423584887d7a3648ad4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8470 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-06Rework DepthwiseConvolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5632 Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-03Fix Batch Matmul nightly failureAdnan AlSinan
Resolves COMPMID-5601 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I1baf92c4751d784d017c0b2f7de1fc09e42ce69c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8309 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-03Optimize CPU add layer on quantized dataViet-Hoa Do
* Use fixed-point arithmetic where possible. * Various optimization for the FP32-based implementation. This implementation is kept as the fall-back solution in case of unrealistic quantization parameters that exceed the range of fixed-point solution. Resolves: COMPMID-5458 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I221d2d3801ecaae4fe0b7cf6ae8ef00ca3743665 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8317 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-26Add FP32 Neon™ swish activationJonathan Deakin
Change-Id: Id37b59adbc8c4cbe218d1652aeb02a0b4ce42c66 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8256 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-22Fix unresolved symbol for target armv7a + AndroidPablo Marquez Tello
* Resolves COMPMID-5599 Change-Id: I4c1df48eda289c82ca567f6808fccd0b09065223 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8302 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-21Add test for ClGemmLowpMatrixMultiplyCore to test a batched matrix ↵Ramy Elgammal
multiplication with variable input tensors Resolves: COMPMID-5506 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I8345a3b7a83ef46f9ec7a77197cc65c933ec9ac6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8239 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-16Fix validation in validate_image2d_support_on_rhsGian Marco Iodice
Resolves COMPMID-5533 Change-Id: Ice3d9469c7486a700c58fb61fc692b13f368d202 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8148 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-16Fix bug in QASYMM8_SIGNED to F32 cast layerViet-Hoa Do
Resolves: COMPMID-5580 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ia731560c23a6ab2e0ead5a857fbabb9cbc25154c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/452428 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8268 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2022-09-16Optimize Quantized/Integer Bilinear Scale for Neon™Gunes Bayir
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC. This patch - Reduces the memory footprint by disabling precomputation of indices and weights when they're not used - Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8) - Adds S8(Int8) Bilinear Scale for Border mode REPLICATE - Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation - Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset Resolves: COMPMID-5453, COMPMID-5454 Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Fix invalid memory access for dynamically fused Cl Elementwise kernelsSiCong Li
The M0 and N0 were incorrectly set for the case of broadcasting when the elementwise component is non-root. This is because we previously always use rhs tensor to derive the load M0, N0. But for non-root components, the addend/divisor tensor can be in the lhs or rhs. Thus this would fail in case the addend/divisor is in the lhs. - Also fixes broken Dynamic Fusion test Resolves COMPMID-5482 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I37f27ffa392781387db15739b1666f1dad28c554 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/445890 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8111 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Adding GELU activationMurray Kornelsen
OpenCL implementation uses built in erf. NEON implementation requires new vectorized erf. Uses the following approximation: erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4 a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108 From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14INT8 Quantized MeanStdDevNorm (LayerNorm)Murray Kornelsen
Implements LayerNorm for qasymm8 tensors. Uses uint8x16 loads and stores. Summation is performed in integer arithmetic (vpaddl) Normalization is performed in float32 before requantizing back to int8. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2407c8b34717fb47adab98791bd76fb8a3c62f4a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7922 Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-13Add test case for disable Winograd on fp16 if fast-math = falseRamy Elgammal
Resolves: COMPMID-5531 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Id1a5f909d64e897cdef2e920240ac5ae244166b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8242 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-12Add test for NEGEMM to test a batched matrix multiplication with variable ↵Adnan AlSinan
input tensors - Add a test for CPU to batched matrix multiplication with variable input tensors - Disable assembly kernel when using _reshape_b_only_on_first_run flag Resolves COMPMID-5501 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: If96b182584617806a9dfe597dbfaf05241b123c2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8234 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Add test for CLGEMM to test a batched matrix multiplication with variable ↵Mohammed Suhail Munshi
input tensors Resolves : [COMPMID-5502] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ida001dc597973f9180468737a3e32e5022e6baee Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/450342 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8224 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-04[ONCPUML-970] Fast math mode for fixed format kernelsPablo Marquez Tello
Minor tweaks and test for running fixed format kernels with BF16 operations when specified by the user. Change-Id: Ic8167f67b86b1298da65e46cfebed9f3b86940e4 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8000 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-03Add Dynamic Fusion Tests with BugFixesMohammed Suhail Munshi
- Allow fusing arbitrary number of existing elementwise operators - Fix issues with 3D and 4D tensors in Elementwise Addition and Floor components - Collapse the 3D/4D window in the same way as that used by Conv2d, i.e. collapse dim 1 and dim 2 together - Fix Floor component issues when used after other components - Add Dynamic Fusion Tests (Floor + Div, Conv2d + Add + Div) - Add Addition ElementWise Broadcasting Test Resolves: [COMPMID-5356] Change-Id: I58b93a90175bb0440d43531d18cac94b5f5c2689 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/433956 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7957 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-02Update the GPUTarget listGian Marco Iodice
Resolves COMPMID-5405 Change-Id: I995b5e79bef13529097ed17f7854763a4cf89272 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7986 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-01Optimize add layer by considering the input tensors as 1D arrayGunes Bayir
Resolves: COMPMID-5108 Change-Id: I544f8160fbe5b4ffbef348d1fbd3dd626a6e1bdb Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8002 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-01Fix building failure with validate_examples set to trueMichalis Spyrou
Resolves: COMPMID-5402 Change-Id: Ic4abda36c475c3a7d86fb469d7ed1b23a62ea182 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7991 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-26Fix for inclusion of "arm_gemm" from src into "Types.h" from coreRamy Elgammal
- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat when needed through two map function. - Moved to_string(WeightFormat) to TypePrinter.h Resolves: COMPMID-5415 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Sicong Li <sicong.li@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>