aboutsummaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2023-07-05Rewrote CLArgMinMax for axis 0Pablo Marquez Tello
* Simpler implementation without stages for axis 0 * Removed considerable amount of code. Resolves COMPMID-6271 Change-Id: Ie8bcb2f0b55f87472f44b38872a23a922619a211 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9849 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-05Fix unused function warningMichael Tyler
Resolves: COMPMID-6337 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Change-Id: Id8e9b39e55ab3e13beda720e24ba9ea7e6f97762 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9868 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-04Depthwise channel pre-multiplicationMichael Tyler
Resolves: COMPMID-6337 Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-04Add Kernel Writer driver code to dynamic fusionSiCong Li
* Partially port ElementwiseBinary component to ckw (broadcast not supported yet) * Port Store component to ckw * Move KernelArgumentsHelpers to ckw_driver/ as it's only used by the driver ckw_driver is a middle layer between dynamic fusion and Compute Kernel Writer (CKW). It consumes the fused kernel component stream produced by Dynamic Fusion and uses CKW to write the kernel code complete with all meta info needed by the runtime to enqueue the kernel. It consists of two parts: * Kernel writing: This resides in dynamic_fusion/sketch * Runtime utilities: This resides in dynamic_fusion/runtime The integration (separation between DF and CKW) occurs in two places: * Inside GpuCKWDriver global driver that coordinates how the final fused kernel code is assembled together alongwith other meta info needed by runtime. * Inside each instantiated IGpuCKWComponentDriver component driver that drives CKW to write component-specific code or do component-specific configurations Partially resolves: COMPMID-5792 COMPMID-6282 COMPMID-6260 COMPMID-6266 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib57a080a65fe8cfee1a8df1529fe572005a6d2f2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9847 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-29Implement FP32/16 MatMul Lhs T Rhs T/NT kernel using MMUL extensionGunes Bayir
Resolves: COMPMID-6196, COMPMID-6197 Change-Id: I22a1c32686eb70e7676c8b4d64a76dbaeb638cb3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9798 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Add helpers to set CKW tensor components as OpenCL kernel argumentsJakub Sujak
* Define ckw::TensorStorage. The tensor storage represents the type of tensor memory object. * Add helper functions for setting the CKW TensorComponent and TensorStorage as OpenCL kernel arguments. * Refactor CL Image2D method for simpler image object creation. Resolves: COMPMID-5784 Change-Id: I2d37d06783c1dc55f3b5692b44eb49b151f2401c Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9807 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Use MatMul in fully connected layer with dynamic weights when supportedMohammed Suhail Munshi
- Use MatMul kernels in FC layer when using dynamic weights without broadcasting or bias. - Fix minor typo in IClMatMulNativeKernelConfig.h Partially Resolves : [COMPMID-6193] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id494062b5b4f4e75ff9714c202dde941955afa52 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9797 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-23Implement FP32/FP16 MatMul NT/T kernel using the MMUL extensionRamy Elgammal
Resolves COMPMID-6195 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I8e85fe73308ed84ebb142d6d6d1562b62dddfaa5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9819 Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-23Fix doxygen warningsramy.elgammal@arm.com
Resolves: COMPMID-6312 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I9f68ccd2edb8c4d03fec19e6b9c29609d4833342 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-21Enable vmfa in arm7va/aarch32 when presentPablo Marquez Tello
* vfma is an extension on armv7a and it can be enabled with -mfpu=neon-vfpv4 * Resolves MLCE-1079 Change-Id: Id455c39ee4feb8d3cdc4515c8307eb8a5d6e093b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9795 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-19Implement FP32/FP16 MatMul NT/NT kernel using the MMUL extensionSiCong Li
Resolves COMPMID-6194 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ie45e2aa9533948b2e5235563cef1d3834494eccf Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9739 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-16Add Fused Activation to OpenCL MatMulMohammed Suhail Munshi
- Added fused activation to MatMul function interface - Added fused activation to CL backend - Includes tests for supported Activation Functions in MatMul Resolves: [COMPMID-6192] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie103212b600b60699eaf6a6394d609e6e1f5aba6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/522465 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9714 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up Utils.h a bit to reduce unused code being included everywhereMatthew Bentham
Move some maths-related things from Utils.h to new Math.h header in utils/math. Move some routines used for Tensor shape validation to Validate.h Change-Id: I8ce89fe03ec3ae1b61d1a80c282b8b91eea0cfb3 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524783 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9743 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up arm_compute/core/Types.h a bitMatthew Bentham
Split some of the larger types with inlined code into their own header files, so that the implementation of them needn't be included everywhere. Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-07Fix build error for armv7aPablo Marquez Tello
* Removed the BF16 code related to the linker error for armv7a * Resolves COMPMID-6288 Change-Id: I2dcedf5c0ba684f8e31865985899bf00b9390c9e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9736 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <michael.tyler@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-07Fix guards for FP16 depthwise kernelsMichael Tyler
Resolves COMPMID-6291 Change-Id: Ibe5da8cfcf6d7fd994ddf7759efd0e773accdeb2 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9731 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-06-05Update CPU kernel implementations and guard directivesMichael Tyler
Resolves COMPMID-6023 Change-Id: I868975d14c4f98af6716726feda22405a6a4c891 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9686 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-17Revert "Check for nullptr when failing to load OpenCL libraries"Omar Al Khatib
This reverts commit 9d254610ef3072263ac5c20eed906157e8869b3d. Reason for revert: Causing High Impact failure in Coverity Change-Id: I7a3b963cc13c0eea856a15451ffed8c4d4a62e75 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9643 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: SiCong Li <sicong.li@arm.com>
2023-05-16Check for nullptr when failing to load OpenCL librariesJakub Sujak
Resolves: COMPMID-6272 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Change-Id: I4b4ff4eeb909ac95c63d1f2d2081f2be9ade2295 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9662 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-11Fix invalid vector length in CLViet-Hoa Do
Resolves: COMPMID-6252 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I97ddf8a6c83bc2621abc712094db6bc0fe3d97b1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9620 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-10Re-enable dyanmic weights in Neon™ depthwise convolutionRamy Elgammal
- Call Neon™ depthwise convolution validation inside in its configure() method. Resolves: COMPMID-6188 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib2ae4d995ff2bbc92ce4496d4ab93cf09113e3e9 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9594 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Support multi-dimensional indices in the CL Gather Layer up to ↵Omar Al Khatib
four-dimensional output tensors Resolves [COMPMID-5775] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I6f6c12ac08f0b0ad070ca5d715c531c2c3762c30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Guards to make NEReorder aarch64 onlyDavid Svantesson
Resolves COMPMID-6151 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I0e8c957f3460633c32ef57be0cdc44a53b8c3e88 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9553 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Update a64_transpose_interleave_16.hppDavid Svantesson
Updates a64_transpose_interleave_16 transform, syncing with current version in gemm-linux. These fixes ensure that zero padding is done on out of range columns. Partially resolves ONCPUML-1232 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: Ic2d30b622e4a3b0099d6f07037336a2aaaa64e13 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9550 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-02Fix fully connected and matmul mismatchesViet-Hoa Do
* There is an issue with quantized fully connected and matmul when the lower bound of bounded ReLU is negative. * Use int32_t for the calculation of min/max quantized value rather than PixelValue to avoid this issue. Partially resolves: COMPMID-5996 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I7b22e9d56a2441fc6a4c5c4e627f57d6e00d6ff1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9502 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Reorder addedDavid Svantesson
Adds Reorder kernel exposing blocking reorders from arm_gemm Resolves ONCPUML-1232 Change-Id: I42bf4166311fe1771565134d3ed7039fc8e30230 Signed-off-by: David Svantesson <david.svantesson@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9500 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Fix the gather layer indices checkViet-Hoa Do
* If the index is out-of-bound, both CPU and GPU implementations of the gather layer will output 0. Resolves: COMPMID-5964 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ib029b3acfb31452f2097c8c75448fb2697cfa332 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9487 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-27Avoid printing error message for each not found OpenCl libraryRamy Elgammal
- Only print can't load any opencl library if none of "libOpenCL.so", "libGLES_mali.so", "libmali.so" are found. - If any of them is found, then no need to print any error message. Resolves: COMPMID-5973 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I9d62bd33545bbbf54d69836a4dca58a6294bc479 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/511804 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9483 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-27Add quantized CL MatMul kernel for LHS NT, RHS TJakub Sujak
Implement a native kernel for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED and with the MatMul attributes `adj_x = false, adj_y = true`. Resolves: COMPMID-5923 Change-Id: I477b2dd886edfe83beaba9efc7d6b05ed19f5da4 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9467 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Improve Winograd performance on OpenCLGian Marco Iodice
- Performs more output elements per work-item in the case of Fp16 computation in Winograd Input/Output transform Resolves COMPMID-6018 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: If5e6f5182eff8c1f05a3505c437d0a997490f0bd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9447 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Add FP16 depthwise kernels for SME2David Mansell
Resolves: COMPMID-5988 Change-Id: I93e78edf31c9eec8242ccbb8c3c768f46a7c7c38 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9456 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Integrate multi-threaded pretranspose_B_arraySiCong Li
This is required for the case where rhs (B) is dynamic and needs to be pretransposed in every run. In a multi-threaded setting, this means the previously single-threaded pretranspose_B_array would become the bottleneck Resolves COMPMID-5896 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Id508c46992188a0f76a505152931d4955d04c16d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-25Fix rounding to nearest even for armv7aRamy Elgammal
- If input value to round has the decimal point beyond the fraction part (23 bit) then simply return the rounded value = input value. Resolves: COMPMID-6025 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1994e49a9bca7daeaeec7681aec099c63a97b53f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9473 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-20Implement CL kernel for a native batched matmul Quantized - LHS transposed, ↵Omar Al Khatib
RHS transposed Resolves: [COMPMID-5924] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I9ba657737eb1e3a096c8341ad4ad311571f8edeb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9454 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-04-19Add quantized support for CPU MatMulViet-Hoa Do
Resolves: COMPMID-5899 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I89d96e292c3492ba9b1900a3e5683f9dcd11dfc6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9440 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-17Add quantized CL MatMul kernels for Lhs NT/T, Rhs NTGunes Bayir
Implement OpenCL kernels for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED. Quantized MatMul is supported with the following MatMul attributes: * adj_x = false, adj_y = false * adj_x = true, adj_y = false We consider native format kernels only. In other words, no reshaping of the operand matrices is done. Resolves: COMPMID-5921, COMPMID-5922 Change-Id: I99e0f68054a2bd635c60ec2641acc2e7ff398473 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9435 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-13Implement MatMul Function and Operator with Floating Point support for CPUMohammed Suhail Munshi
- Implements MatMul function and operator for floating point datatype FP16/FP32 - Includes support for transposing dynamic tensors prior to matrix multiplication. - Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors) - Updates fixture to allow for testing fused activation in MatMul - Adds tests for matmul with and without fused activation Resolved: [COMPMID-5898] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Add Cropping to CLBatchToSpaceOmar Al Khatib
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance. - Add cropping support and cropping tests Resolves [COMPMID-5865] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-30Add cropping support to NEBatchToSpaceSiCong Li
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance - Add cropping support and cropping tests Resolves COMPMID-5918 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ifea0f5f7760ffd0f4d5d4f3a5ae8d14d0b98b790 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9378 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Fix GCC13 compiler errorsPablo Marquez Tello
* Removed namespace arm_compute::utils::requires to fix the build error ‘requires’ is a keyword in C++20 [-Wc++20-compat] * Added missing includes for cstdint.h * Resolves MLCE-1040 Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Add quantized support for unary elementwise in CPUViet-Hoa Do
* Add quantized unary elementwise in CPU using LUT. * Widen the input data range of the test suite. - Fix CPU exponential function overflow/underflow range. - Fix saturation issue of CL round operator. Resolves: COMPMID-5763 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-27Fix BatchToSpaceFixtureSiCong Li
* Use a vector to represent the (static) block shape instead of an N-D Tensor. The previous use of ND Tensor as block shape was wrong, not adhering to the specification, and non-functional (only first dim was used anyway). * The fixture now accepts a static block shape, because the dynamic case is not properly implemented and will be deprecated for now. * Fix an assertion error in reference implementation. Partially resolves COMPMID-5918 Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs T kernelsRamy Elgammal
Resolves: COMPMID-5952, COMPMID-5956 Change-Id: Idbd14538e7660792254072fa9631a6f03966f89b Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9371 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs NT kernelsGunes Bayir
Resolves: COMPMID-5945, COMPMID-5954 Change-Id: I7b27021d21f8e08c4896f6b1f595a75125064f9e Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9356 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-23Round to nearest with ties to away from zero in ReluPablo Marquez Tello
* This patch adds support for rounding modes in vmlaq_qasymm8_signed which is used to compute Relu for quantized types * Partially resolves MLCE-1018 Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-21gemm_interleaved: Set up the accumulation buffer properly in alternateDavid Mansell
threading mode. Resolves: COMPMID-5844 Change-Id: Iceb0018114bbca2bfdac4d4406936f9b260539e9 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9070 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-03-20Implement OpenCL MatMul for Lhs T Rhs T/NT FP32/16Gunes Bayir
- Implement opencl kernel for LHS transposed and RHS non-transposed - Implement opencl kernel for LHS transposed and RHS transposed - Add validation tests Resolves: COMPMID-5953, COMPMID-5955 Change-Id: I55589acbffe86c44e29807574975978a1ec09bad Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implementation of RSQRT for quantized int8Ramy Elgammal
Resolves: COMPMID-5863 Change-Id: I9ff67face62826c1d335a6b941e8516be39bdac8 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/488768 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9225 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
- Implement ClNativeMatMulKernel class - Implement opencl kernel for LHS non-transposed and RHS non-transposed - Implement opencl kernel for LHS non-transposed and RHS transposed - Add test fixture and dataset for matmul - Implement transpose_tensor() for reference implementation to transpose high dimensional tensors Resolves: COMPMID-5944, COMPMID-5951 Co-authored-by: Gunes Bayir <gunes.bayir@arm.com> Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1d5b8978f41be27baddb3153ade880472141573f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-13arm_gemm: Add SME2 FP16 kernels.David Mansell
Resolves: COMPMID-5966 Change-Id: Ic0d694493178da029a297643855bd0cff01b174f Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9302 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>