aboutsummaryrefslogtreecommitdiff
path: root/filelist.json
AgeCommit message (Collapse)Author
2023-05-02Removes `experimental` from `experimental_fixed_format_kernels` flagNathan John Sircombe
Renames `experimental_fixed_format_kernels` build option to `fixed_format_kernels`. Adds documentation for the flag covering basics: - What fixed-format kernels are - Why they're needed - Which backend they're for (i.e. CPU) - Some pointers on how to use them. Resolves: ONCPUML-1253 Change-Id: I428c98614c309c9ffc32d0f32daa24740f7cb967 Signed-off-by: Nathan John Sircombe <nathan.sircombe@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9523 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-05-01Add Reorder to changelogDavid Svantesson
Partially resolves ONCPUML-1232 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I258d03524c50dd24975b473aede061f80bf9d91b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9534 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Reorder addedDavid Svantesson
Adds Reorder kernel exposing blocking reorders from arm_gemm Resolves ONCPUML-1232 Change-Id: I42bf4166311fe1771565134d3ed7039fc8e30230 Signed-off-by: David Svantesson <david.svantesson@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9500 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-27Fix filelist issue with CPU MatMulViet-Hoa Do
Resolves: COMPMID-6024 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I210ca5577eeb0b2c91d959fc37dcc77b3847abb3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9517 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Add FP16 depthwise kernels for SME2David Mansell
Resolves: COMPMID-5988 Change-Id: I93e78edf31c9eec8242ccbb8c3c768f46a7c7c38 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9456 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-17Add quantized CL MatMul kernels for Lhs NT/T, Rhs NTGunes Bayir
Implement OpenCL kernels for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED. Quantized MatMul is supported with the following MatMul attributes: * adj_x = false, adj_y = false * adj_x = true, adj_y = false We consider native format kernels only. In other words, no reshaping of the operand matrices is done. Resolves: COMPMID-5921, COMPMID-5922 Change-Id: I99e0f68054a2bd635c60ec2641acc2e7ff398473 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9435 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-14Align naming convention of ClMatMulJakub Sujak
Ensure naming of MatMul on GPU conforms to the naming convention <backend><operator><config> i.e. ClMatMul operator with the backend ClMatMulNativeKernel. Resolves: COMPMID-6015 Change-Id: I021d235b023ad17fe97bd6913e6a50d0ba4b194e Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9443 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-13Implement MatMul Function and Operator with Floating Point support for CPUMohammed Suhail Munshi
- Implements MatMul function and operator for floating point datatype FP16/FP32 - Includes support for transposing dynamic tensors prior to matrix multiplication. - Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors) - Updates fixture to allow for testing fused activation in MatMul - Adds tests for matmul with and without fused activation Resolved: [COMPMID-5898] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-11Enable quantized data types for CpuElementwiseUnary on Armv7aRamy Elgammal
- Adding fallback functions neon_qasymm8_signed_elementwise_unary() and neon_qasymm8_elementwise_unary() - They would be called in case target is not aarch64 Resolves: COMPMID-5994 Change-Id: Id0db1e7cb0fe92f1eaef0b3a9ed2bea01b3f2a15 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9416 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Implement MatMul FunctionRamy Elgammal
Resolves: COMPMID-5949 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Idd8cfe6ea94a14f0b23178f6781251b5f0955563 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9390 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Add quantized support for unary elementwise in CPUViet-Hoa Do
* Add quantized unary elementwise in CPU using LUT. * Widen the input data range of the test suite. - Fix CPU exponential function overflow/underflow range. - Fix saturation issue of CL round operator. Resolves: COMPMID-5763 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
- Implement ClNativeMatMulKernel class - Implement opencl kernel for LHS non-transposed and RHS non-transposed - Implement opencl kernel for LHS non-transposed and RHS transposed - Add test fixture and dataset for matmul - Implement transpose_tensor() for reference implementation to transpose high dimensional tensors Resolves: COMPMID-5944, COMPMID-5951 Co-authored-by: Gunes Bayir <gunes.bayir@arm.com> Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1d5b8978f41be27baddb3153ade880472141573f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-13arm_gemm: Add SME2 FP16 kernels.David Mansell
Resolves: COMPMID-5966 Change-Id: Ic0d694493178da029a297643855bd0cff01b174f Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9302 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-07Add sigmoid and tanh for dynamic fusionViet-Hoa Do
* Add sigmoid and tanh activation functions for dynamic fusion. * Add corresponding tests, but both activation functions share the same fixture implementation. Resolves: COMPMID-5939 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0aae0eaa18b746ce89680d2773c66e09b0f854ce Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9257 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-08Update CPU kernels to remove x19 and w19Michael Tyler
Resolves: COMPMID-5805 Change-Id: Idf720bbb136474810086f5089c5ed23b3f79835a Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9081 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-02-08Add support for dilation > 1 in assembly DepthwiseConvolutionPablo Marquez Tello
* Resolve COMPMID-5689 Change-Id: I81a3791ad054db59562b76d1c729f2b2168aee8b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Signed-off-by: Andrew Mundy <andrew.mundy@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add new operator AddMulAdd for Neon™ backend for Float/Quantized typesGunes Bayir
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types. The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel). The inputs are - input1 : nD tensor [X, Y, Z, W, ..] - input2 : nD tensor [X, Y, Z, W, ..] - add_coef : 1D tensor [X] - mul_coef : 1D tensor [X] The outputs are - out1 : nD tensor (intermediate output) [X, Y, Z, W, ..] - out2 : nD tensor (final output) [X, Y, Z, W, ..] The operation can be summarized as follows: out1 <- input1 + input2 out2 <- Act(out1 * mul_coef + add_coef) The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr. The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward. Resolves: COMPMID-5463 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add Subtraction operator to Dynamic Fusion interfaceRamy Elgammal
Partially-Resolves: COMPMID-5518 Change-Id: I8358784815bcac461d50e384fa7bc96f476d3983 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9045 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Add Multiplication operator (FP only) to Dynamic Fusion InterfaceJakub Sujak
Note: we use a separate test fixture for Multiplication op instead of reusing ElementwiseBinaryFixture to avoid exposing the internal enum ElementwiseOp to the public utils/TypePrinters.h as required by the data test case macros to print the test data. We also do not consider modifying the enum ArithmeticOp in the standard interface to include MUL without an implementation. Future work should consider refactoring this test fixture into the ElementwiseBinaryFixture to reduce the total number of fixtures/code duplication. Resolves: COMPMID-5779 Change-Id: I84207658ce0407095b028fca0ab7bfa2950255ec Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9013 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-25Implement dynamic fusion softmax operatorRamy Elgammal
- Return aux tensorInfo by get_aux_tensors() at runtime to init the aux tensor with the right size. - Keep softmax unfusable for this commit - Hence, added Tensor3D to template writer arguments declaration, for sake of keeping dynamic fusion softmax componenets' kernels matching their cl counterparts. Resolves: COMPMID-5523 Change-Id: I667f39545db925f667036ef448302c79a0330373 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/483924 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8986 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Add missing direct conv2d tests to dynamic fusionSiCong Li
* Add direct conv2d tests as a separate fixture so that we can enable future direct conv2d specific tests * Move Conv2dAttributes to its own file Partially resolves COMPMID-5736 Change-Id: I530649488faf3bbed1a4fc7d16a74063bfdf33db Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8928 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Dynamic Fusion Pooling Layer 2dMohammed Suhail Munshi
- Adds Dynamic fusion PoolingLayer2D as Unfusable Operator - Indices are not supported - Adds tests for F32/F16 Datatypes Resolves : [COMPMID-5520] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I0d112545eb9209c836bf9ea153069f8627531e0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8893 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-18Revert "Update CPU kernels to remove x19"Michael Tyler
This reverts commit 3c59f01c209d2732a15d97d65565ead964787a8b. Resolves: COMPMID-5817 Change-Id: Ie2443a21854a95db1e3d0cafa2121c0187a5e237 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8974 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-16Update CPU kernels to remove x19Michael Tyler
Resolves: COMPMID-5805 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Change-Id: I250f64531e209625e4ff176dd5a552c1c34bc484 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8909 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-06Implement dynamic fusion reshape operatorRamy Elgammal
Resolves: COMPMID-5522 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: If4e5736a2f7ff42e70276d7f4e0f3ebcb38414e6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8881 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-30Move DWC native heuristic into the heuristic folderGian Marco Iodice
- Move the DWC native heuristic from CLDepthwiseConvolutionLayer to heuristic/ - Update the heuristic for Arm® Mali™-G77. Use a smaller block size (4x2) for Fp16 - Call the new heuristic in GpuDepthwiseConv2d Resolves COMPMID-5798 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I6bfd30cea76bea2e98202a7a5c1d51709f3382a4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8889 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Update the ClConv2d heuristicGian Marco Iodice
- Update the ClConv2d heuristic to call indirect convolution on Arm® Mali™-G77 Gpus - Implement the indirect conv2d heuristic for selecting the block size Resolves COMPMID-5713 Change-Id: If6ad49124561207153685c6abd4f54950a376fbc Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8886 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Implement Logits1DMaxShiftExpSum kernel component in dynamic fusionGunes Bayir
Resolves: COMPMID-5719 Change-Id: I2f0911ffccce2b42a9a63fe6826eaa5d2cad06ba Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8831 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-12-28Add Resize/Scale operator to Dynamic Fusion interfaceJakub Sujak
Resolves: COMPMID-5521 Change-Id: Id38a4ce18f9ea8805a151acb064e72795535d1a0 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8859 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-16Add output operator for dynamic fusionViet-Hoa Do
* The output of the fused operator must be explicitly specified using GpuOutput operator. * Any temporary tensors used to connect the output of an operator to the input of another operator will be marked as no-alloc and won't be allocated as a tensor in the memory. Resolves: COMPMID-5771 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5ae8e800f8f737db23a055a92b01c4f1d78c3bb8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8794 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-13Add CLAMP operator to Dynamic Fusion interfaceJakub Sujak
Add the CLAMP activation function for GPU backend with generic activation Component and TemplateWriter modules. CLAMP is internally implemented as LU_BOUNDED_RELU activation function with the alpha and beta variables swapped. We do NOT consider in-place computation cases in this patch. * CLAMP operator for GPU backend. * Activation Component and TemplateWriter for CL backend. * TemplateWriter generates tiled kernel code. * Supported data types: F16, F32. * Validation tests for CLAMP operation. Resolves: COMPMID-5519 Change-Id: Ieb097d6b1e6a7ed2b882518e88314454efb402f6 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8762 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement Cast operator in dynamic fusionGunes Bayir
The operator is migrated into dynamic fusion for all data types supported Resolves: COMPMID-5693 Change-Id: I3c550d3d1cd04570f453beae678c3f60d4cb1a73 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8755 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Use heuristics for setting dynamic fusion direct conv2d tile sizesRamy Elgammal
Resolves: COMPMID-5735 Change-Id: I9958413b69c5052cfa205dd0e9457cc4953aaf35 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/474818 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8724 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
- Implement indirect convolution kernel - Add operator support - Add test Resolves COMPMID-5709 Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-29Adding GpuAdd to dynamic fusion operatorsRamy Elgammal
- Provide support for Add operator - Auto initialize the destination tensor before testing fusion in conv2d and elementwise binary ops. Resolves: COMPMID-5518 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ibd815020f02b57f88eea7c2921bdcf98605d99c5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8617 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-28Integrate SME2 kernelsViet-Hoa Do
* Add SME/SME2 detection. * Integrate SME2 implementation for: - Normal convolution - Winograd - Depthwise convolution - Pooling Resolves: COMPMID-5700 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2f1ca1d05f8cfeee9309ed1c0a36096a4a6aad5c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8692 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-28Implement FP32/16 Depthwise Conv2d operator in dynamic fusionGunes Bayir
This patch adds Depthwise Conv2d operator into dynamic fusion interface and adds the associated tests. Resolves: COMPMID-5517 Change-Id: I385c94dff7fd40c72b8337ef797e508df4499a82 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8678 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-25Implement address precalculation for indirect conv2d - OpenCLGian Marco Iodice
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel) - Implement OpenCL kernel (indirect_convolution.cl) - Add test Resolves COMPMID-5708 Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint. Resolves: COMPMID-5676 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-08SVE Hard-Swish via Lookup table for quantized inputPablo Marquez Tello
* Datatype supported qasymm8 and qasymm8_signed. * Resolves COMPMID-5211 Change-Id: Ib161af3af5ad11ec6b46de0bf4fbac172d2525fb Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7729 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rewrite dynamic fusionSiCong Li
The new version introduces the following major changes: * Change public interface to simplify and standardize the user experience - Use the term "Workload" uniformly - Simplify operator interface to be a set of static methods: validate_op(), create_op() * Separate the kernel writing into its own component (template_writer). This is to allow the co-development of GpuKernelWriter, and to allow easy replacement once GpuKernelWriter is mature. * Optimize the core fusion algorithm used by the component graph. The details can be found in GpuKernelComponentGraph::fuse() * Use Gpu instead of Cl prefixes for most of the Workload interfaces (except for runtime and kernel components, which have to be language specific) This allows the potential extension to other Gpu langauges in the future. * Refactor runtime memory interface so that auxiliary tensor handling is separate from the user tensor passing. This is because the former is less stable and may require extension in the future. * Hide source code object from the user as it is not required at the moment * Deprecate the old prototype entirely by disabling it in SCons build Resolves COMPMID-5510, COMPMID-5512, COMPMID-5513 Change-Id: If69d2362856f2de4503546b7b6cf48a525cf3079 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8406 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-16Optimize Quantized/Integer Bilinear Scale for Neon™Gunes Bayir
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC. This patch - Reduces the memory footprint by disabling precomputation of indices and weights when they're not used - Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8) - Adds S8(Int8) Bilinear Scale for Border mode REPLICATE - Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation - Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset Resolves: COMPMID-5453, COMPMID-5454 Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14INT8 Quantized MeanStdDevNorm (LayerNorm)Murray Kornelsen
Implements LayerNorm for qasymm8 tensors. Uses uint8x16 loads and stores. Summation is performed in integer arithmetic (vpaddl) Normalization is performed in float32 before requantizing back to int8. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2407c8b34717fb47adab98791bd76fb8a3c62f4a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7922 Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-17Add LUT for quantized sigmoid functionViet-Hoa Do
* Move LUT implementation to a seperate file. It will be used for both QASYMM8 and QASYMM8_SIGNED. * Fix wrong constant value related to QASYMM8_SIGNED leaky ReLU in 32-bit build. Resolves: COMPMID-5464 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2b24d52409a38f1b66fd532f431eff8a9e4547b6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8066 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-22Add GemmLowp MMUL Reshaped Only Rhs Support for QASYMM8/QASYMM8_SIGNEDFreddie Liardet
This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5398 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-18Fix multi_isa build failure after Winograd integrationramelg01
Partially Resolves: COMPMID-5400 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I9ed6b85b105a15b1a35c6a921f63ea94c665438b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/437221 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7933 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2022-07-14Integrate new winograd APIs from MLTechramelg01
Resolves: COMPMID-5400 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib4428436dd7a6e40d8b2d8a2f8dac1b079154551 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7894 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-13Add Gemm MMUL Reshaped Only Rhs Support for FP32/FP16Gunes Bayir
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5216 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-08Extended direct conv 2d interface for tuning the OpenCl kernelGian Marco Iodice
Resolves COMPMID-5298 Change-Id: Ie9b907e5dcf86aa6add8d08799fa7ba7c264edea Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7888 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-27Implement new Elementwise Dynamic Fusion Operators: Div, FloorMichalis Spyrou
Resolves: COMPMID-5355 Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>