aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-02-15Fix Intermittent Neon™ ReduceMean QASYMM8 Mismatchv23.02branches/arm_compute_23_02Mohammed Suhail Munshi
- Dividing scale by number of elements causes accuracy loss due to limitations in float datatype and truncation to int - Adds rounding after division on aarch64 to negate this. Resolves: [COMPMID-5839] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I54ef0f7e56f39da1fa5f30378f551b5ca419a61d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/492456 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9110 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-09Fix performance regression in Transposed ConvolutionGunes Bayir
Resolves: COMPMID-5849 Change-Id: I86f8bbc1f3a7c12c66d5ad8fcd74dd9e69629aa0 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9102 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: Jakub Sujak <jakub.sujak@arm.com>
2023-02-03Fix armv7a failing GEMMConvolutionLayer testsMohammed Suhail Munshi
Resolves: [COMPMID-5854] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ib0228409be5e816acca7e123f2660eb01a79e38f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9078 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-02-02Fix GEMMLowp/Batched MatMul mismatches on CPUMohammed Suhail Munshi
- Fixes Column Offset matrix is not being iterated through in y dimension Resolves : COMPMID-5795 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I0190474be404b4f0e171855739cfd0a48cbed5bc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9020 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add new operator AddMulAdd for Neon™ backend for Float/Quantized typesGunes Bayir
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types. The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel). The inputs are - input1 : nD tensor [X, Y, Z, W, ..] - input2 : nD tensor [X, Y, Z, W, ..] - add_coef : 1D tensor [X] - mul_coef : 1D tensor [X] The outputs are - out1 : nD tensor (intermediate output) [X, Y, Z, W, ..] - out2 : nD tensor (final output) [X, Y, Z, W, ..] The operation can be summarized as follows: out1 <- input1 + input2 out2 <- Act(out1 * mul_coef + add_coef) The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr. The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward. Resolves: COMPMID-5463 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add Subtraction operator to Dynamic Fusion interfaceRamy Elgammal
Partially-Resolves: COMPMID-5518 Change-Id: I8358784815bcac461d50e384fa7bc96f476d3983 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9045 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Remove fixed format strides hackJonathan Deakin
- Remove hack in CpuGemmAssemblyDispatch.cpp which tried to guess strides for fixed format kernels. Instead, expect that strides will have been correctly set on weights externally - Update fixed format test fixtures to set the strides - If the fixed format uses fast math mode, then weights should be of type BFLOAT16. Change the validation logic to accept this. Resolves: [ONCPUML-1131] Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I0f18d8b86b0f639be25fd122fa06a591e90645f2 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8985 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Fixed clang-cl linker errorsPablo Tello
* Linker errors caused by the declarations of the DWC functions not matching the functions implementation. Changed the functions declaration to match the implementation. * Partially resolves MLCE-996 Change-Id: Ie6458c80bc425deaa6c239828b9f4a2a6646f503 Signed-off-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9056 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Add Multiplication operator (FP only) to Dynamic Fusion InterfaceJakub Sujak
Note: we use a separate test fixture for Multiplication op instead of reusing ElementwiseBinaryFixture to avoid exposing the internal enum ElementwiseOp to the public utils/TypePrinters.h as required by the data test case macros to print the test data. We also do not consider modifying the enum ArithmeticOp in the standard interface to include MUL without an implementation. Future work should consider refactoring this test fixture into the ElementwiseBinaryFixture to reduce the total number of fixtures/code duplication. Resolves: COMPMID-5779 Change-Id: I84207658ce0407095b028fca0ab7bfa2950255ec Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9013 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Bazel and CMake buildsDavid Svantesson
Resolves: ONCPUML-1110, ONCPUML-1109 Co-authored-by: Georgios Pinitas <georgios.pinitas@arm.com> Co-authored-by: Joe Ramsay <joe.ramsay@arm.com> Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: Iea693dbe53bf0af87867d6a9e0d1fd9fbe59ef3a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8981 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-30Skip upsampling for deconvolution when not neededAnnop Wongwathanarat
If the input tensor's stride is 1 and the kernel size is 1x1, skip upsampling step and pass the input tensor pointer for convolution directly. Partially resolve: [ONCPUML-1137] Change-Id: I9de9444ff99cf35d44a51ccbe0fa6facc1035d27 Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8994 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-26Fix num_threads_hint() on macos.Pablo Marquez Tello
* Query the number of big cores rather than the total number of cores on the system * Resolves MLCE-994 Change-Id: I88cb6a4fd2ece9a035edd4cc5c0f5cf4aef93468 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9006 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-25Implement dynamic fusion softmax operatorRamy Elgammal
- Return aux tensorInfo by get_aux_tensors() at runtime to init the aux tensor with the right size. - Keep softmax unfusable for this commit - Hence, added Tensor3D to template writer arguments declaration, for sake of keeping dynamic fusion softmax componenets' kernels matching their cl counterparts. Resolves: COMPMID-5523 Change-Id: I667f39545db925f667036ef448302c79a0330373 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/483924 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8986 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-24Change dynamic fusion API to return destination tensor infoGunes Bayir
The new dynamic fusion API is introduced in the following patch: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8906 For each operator (except Conv2D, which is migrated in the above patch), we - remove destination tensor from is_supported, validate and create calls - make create_op return ITensorInfo* to the intermediate destination object Affected operators: - DepthwiseConv2D - Cast - Elementwise Ops - Clamp - Reshape - Resize Resolves: COMPMID-5777 Change-Id: Ib60ec8a5f081752808455d7a7d790f2ed0627059 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8991 Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-23Fix performance regression when stride equal to kernel sizeGunes Bayir
This patch prefers Gemm-based transposed deconvolution algorithm in case kernel sizes and strides are equal to each other in each dimension. Resolves: COMPMID-5815 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I22052e48341f3284d6bafbdbcce4a48399dc8e87 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8970 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2023-01-20Add missing direct conv2d tests to dynamic fusionSiCong Li
* Add direct conv2d tests as a separate fixture so that we can enable future direct conv2d specific tests * Move Conv2dAttributes to its own file Partially resolves COMPMID-5736 Change-Id: I530649488faf3bbed1a4fc7d16a74063bfdf33db Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8928 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Improve the strided_slice layer on all data typesOmar Al Khatib
Resolves : [COMPMID-5110] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I3889a79c311b697c56d7369305c862433e856487 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8903 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Add Auxiliary tensorsSiCong Li
The asssign_memory_descriptors method could not automatically assign Auxiliary tensors. Therefore changes are made to allow developers to explicitly mark auxiliary tensors. However, to avoid ambiguity between auxiliary and "intermediate" tensors, we solidify the definitions of both: Intermediate tensors are a strictly topological term. They are defined as "inner" tensors within a workload, hidden from the user, as opposed to input and output tensors exposed to the users. Auxiliary tensors are a subcategory of Intermediate tensors, and are also about memory allocation. They are intermediate tensors that need real memory backing. For more details please see the documentation of MemoryType enum Rename MemoryType::NoAlloc to MemoryType::Virtual Partially resolves: COMPMID-5523 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ibde44c2ec1570be9423e0fb38b53bb136ffc36dd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8940 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-01-20Dynamic Fusion Pooling Layer 2dMohammed Suhail Munshi
- Adds Dynamic fusion PoolingLayer2D as Unfusable Operator - Indices are not supported - Adds tests for F32/F16 Datatypes Resolves : [COMPMID-5520] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I0d112545eb9209c836bf9ea153069f8627531e0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8893 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Add enable_fast_math for NEDeconvolutionLayerAnnop Wongwathanarat
Resolves: [ONCPUML-1128] Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Change-Id: I287a71222d3f0289d8cccfcb15383b0a930a55e6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8952 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-18Add broadcast batched matmul validation casesSiCong Li
Related to: COMPMID-5660 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I2314c8b21acc638402c77080d59db2f3fed58fe2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8911 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-18Revert "Update the heuristic for CLDepthwiseConvolutionNative kernel"Gian Marco Iodice
Resolves COMPMID-5813 Change-Id: I5ef6fe9fb6a54db18e41a71085896fd08bc08dbb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8975 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-18Revert "Update CPU kernels to remove x19"Michael Tyler
This reverts commit 3c59f01c209d2732a15d97d65565ead964787a8b. Resolves: COMPMID-5817 Change-Id: Ie2443a21854a95db1e3d0cafa2121c0187a5e237 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8974 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-17Fix ClGemm crashes on unsupported data typesSiCong Li
Resolves COMPMID-5814 Change-Id: I09b206374cf3844c09aebd3c664daec9c2335e6d Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8953 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-16Update CPU kernels to remove x19Michael Tyler
Resolves: COMPMID-5805 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Change-Id: I250f64531e209625e4ff176dd5a552c1c34bc484 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8909 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-13Remove unused code in arm_conv/depthwise/Pablo Marquez Tello
* Removed header files in arm_conv/depthwise * Resolves MLCE-990 Change-Id: Iacddd80e2d83ff0fbafb817014f90c5bc80dab3c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8946 Reviewed-by: Andrew Mundy <Andrew.mundy@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-01-12Update the heuristic for CLDepthwiseConvolutionNative kernelGian Marco Iodice
- Use T_LOAD2D_INDIRECT macro instead of T_LOAD_NHWC_WITH_DILATION in the depthwise convolution opencl kernels - Update the heuristic for Arm® Mali™-G77 Resolves COMPMID-5716 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I32d375b220e04bf05f5d8f0af2231bde600f0665 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8930 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-11Deprecated BF16 support in DepthConvertPablo Marquez Tello
* Removed BF16 validation tests for DepthConvert * Revert back to using inline assembly to convert to/from BF16 * Resolves COMPMID-5800 Change-Id: I803b2ad19ead297417f780c97c5b724cca6b394c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8929 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Fix CL DirectConvolutionLayer validate testsSiCong Li
* Add missing activation infos * Remove faulty test "Shrink window" * Split the tests based on data layout * Fix ClDirectConv2dKernel::validate logic Fused activation in NCHW is not supported at all Resolves: COMPMID-5801 Change-Id: I64dfbd24b77bb02fb4a88b73d5ef84676d85b4fd Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8899 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Extend cl image support to input and output tensorsGian Marco Iodice
- Add support for texture image to input and output of direct convolution - Extend T_LOAD2D_INDIRECT macro to read values from cl image storages Resolves COMPMID-5715 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Idb0410f53f6d0763cd9e39895a7cbf9bc826d33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8904 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-09Add extend padding lock flagRamy Elgammal
- ITensorInfo's padding cannot be extended if its lock_paddings flag is set to True. Resolves: COMPMID-5714 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I6bca9bbf7172822af60562310578c438b9e15f46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8875 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-06Implement dynamic fusion reshape operatorRamy Elgammal
Resolves: COMPMID-5522 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: If4e5736a2f7ff42e70276d7f4e0f3ebcb38414e6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8881 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-06Handle Intermediate tensors within the sketchGunes Bayir
- Intermediate tensor info objects are not created by the user anymore. They're returned from create_op and reused. This will prevent allocation of the intermediate tensors in case of possible interface misuse. - Sketch object handles intermediate tensor info pointers inside its implementation class via a unique pointer vector - Conv2d operator is migrated into the new interface Resolves: COMPMID-5776 Change-Id: I9422e3681eef4f2d2922f6d0a5d7786380837c6d Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8906 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-06LHS broadcasting addition for dynamic fusionViet-Hoa Do
* Binary elementwise operator now can have broadcasting in either X dimension, Y+Z dimension, or both, in either LHS or RHS operand. * Fix bug in CL code to support batching. Resolves: COMPMID-5704 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I51b04986d30861f255ca9f754adffa0e6c85a26b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8898 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-30Add temporary tile support for dynamic fusionViet-Hoa Do
* Multiple intermediate tensors can share the same tile. - A simple operator can reuse the input tensor for the result if the input tensor has the same shape, data type and it is only consumed by that operator. - The special case is a simple operator and an output operator consume the same tensor. However as the output operator doesn't change the content of the input tensor, it doesn't count as "consuming" the input tensor. * These temporary tiles are declared automatically by the template writer. Individual operator doesn't need to generate output tile declaration. * Cast is now simple operator. Resolves: COMPMID-5778 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I232647ac976645e2d266a62e055b9eb48c356a8e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8877 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-30Move DWC native heuristic into the heuristic folderGian Marco Iodice
- Move the DWC native heuristic from CLDepthwiseConvolutionLayer to heuristic/ - Update the heuristic for Arm® Mali™-G77. Use a smaller block size (4x2) for Fp16 - Call the new heuristic in GpuDepthwiseConv2d Resolves COMPMID-5798 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I6bfd30cea76bea2e98202a7a5c1d51709f3382a4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8889 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Optimize CL Scale/Resize Quantized by removing (de)quant. codeGunes Bayir
This patch removes the quant/dequant code in CLScale and the Resize operator in dynamic fusion. We don't support different quantization information for input and output and in this case the quantization and dequantization is not necessary. The very same optimization was delivered for cpu. It also moves the SCALE_X and SCALE_Y arguments to look-up table from build options in the template writer of Resize. Change-Id: Icd043c8671220c8feea935dd4b24a5b17c6c4ea4 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8888 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Update the ClConv2d heuristicGian Marco Iodice
- Update the ClConv2d heuristic to call indirect convolution on Arm® Mali™-G77 Gpus - Implement the indirect conv2d heuristic for selecting the block size Resolves COMPMID-5713 Change-Id: If6ad49124561207153685c6abd4f54950a376fbc Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8886 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Extend Transposed Conv. for tiles with N0>1Gunes Bayir
Partially Resolves: COMPMID-5724 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I0aeddddcdd87c8c79f6dae9a76ffdc2ba0c08e17 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8883 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Use CPU quantized addition kernel for quantized subtractionOmar Al Khatib
Resolves : [COMPMID-5629] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I061ea5bdafa3a01e66ff869d158f26a38d19e125 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8835 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Fix company name on copyright noticeViet-Hoa Do
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I85731bb688864a29b95adc729083e0c8e2ab61f8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8885 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Fix various compilation errorsViet-Hoa Do
Partially resolves: COMPMID-5794 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I275d0401be978e86507990bdb7dc5b1538a108d8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8884 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Implement Logits1DMaxShiftExpSum kernel component in dynamic fusionGunes Bayir
Resolves: COMPMID-5719 Change-Id: I2f0911ffccce2b42a9a63fe6826eaa5d2cad06ba Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8831 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Add Resize/Scale operator to Dynamic Fusion interfaceJakub Sujak
Resolves: COMPMID-5521 Change-Id: Id38a4ce18f9ea8805a151acb064e72795535d1a0 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8859 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-23Make CLReshape kernel window based on dst instead of srcRamy Elgammal
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Partially-Resolves: COMPMID-5522 Change-Id: I1d90003079c3f24d081cc49f7b110eda753f6995 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8838 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-23Add multiple output support for dynamic fusionViet-Hoa Do
* The dependency graph now can schedule any acyclic graph into a sequential list of operators. This is needed as the output operators now form branches in the graph. * Fix the definition of input, output and intermediate tensors in GpuKernelComponentGroup to support non-linear but sequential list of operators. * Add constraint on GpuOperatorGroup to enforce strictly linear fusion style, but allow output operator as the only form of branch. Resolves: COMPMID-5771 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I68de3a31a2456145081f0a397e4e61dd66327682 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8823 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-22Add is_supported_op interfaceSiCong Li
* This retains the behaviors of the current library's validate() method. I.e. is_supported_op() is used to check the support level and validity of an operator configuration without any consideration of fusion. - validate_op() interface still checks both the op validity and the fusion validity - This arrangement ensures that any users of the original validate() interface can expect to use is_supported_op() in exactly the same way with no performance or behavioral difference. * Force adding const tensors to ArgumentPack when adding to OperatorGroup. This is because OperatorGroup is only for validating fusion, and does not mutate TensorInfos at all Partially resolves COMPMID-5736 Change-Id: I4157677f55848d66a08ec00e6a76d13a24b722b7 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8687 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2022-12-21Optimize MeanReduce by integer acc. and removing upfront dequant.Omar Al Khatib
Resolves: [COMPMID-5466] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I68af0bb54580bebd2ace1fba30aa73f7f68a4dbb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8804 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-21Update direct conv2d kernel in dynamic fusionGian Marco Iodice
Resolves COMPMID-5780 Change-Id: I34c764cd1df652f8a938772924dc49baf6ac16db Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8825 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-21Optimize SVE natural exponential functionViet-Hoa Do
Resolves: COMPMID-5664 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ica2fd82645d95bd64226a1950a013d8a9b9035eb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8833 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>