aboutsummaryrefslogtreecommitdiff
path: root/src/cpu/kernels
AgeCommit message (Collapse)Author
2023-07-04Depthwise channel pre-multiplicationMichael Tyler
Resolves: COMPMID-6337 Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Remove dependency on fp16 definitions from some core include filesMatthew Bentham
This significantly improves the compilation times for parts of the core library that just need a definition of float16_t rather than access to all of the fp16 intrinsics. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: I5da1c6b0df0dd87d1d17948cd2e9b7375874f455 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/529385 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9781 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-21Fix CPU depthwise convolution in case of large paddingViet-Hoa Do
* Avoid the assembly kernels to be used when the padding is greater than the kernel shape. Resolves: COMPMID-6280 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibe0820018c97f4481bf318397b797ec7b351a1d5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9802 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up Utils.h a bit to reduce unused code being included everywhereMatthew Bentham
Move some maths-related things from Utils.h to new Math.h header in utils/math. Move some routines used for Tensor shape validation to Validate.h Change-Id: I8ce89fe03ec3ae1b61d1a80c282b8b91eea0cfb3 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524783 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9743 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up arm_compute/core/Types.h a bitMatthew Bentham
Split some of the larger types with inlined code into their own header files, so that the implementation of them needn't be included everywhere. Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-12Refactor activation LUT computationPablo Marquez Tello
* Moving the code out of Types.h will help with the compilation time. * Added LUT support for all other activation functions. * Resolves COMPMID-6292 Change-Id: I1b5f0b21f03237447163276b8796b2aeb3fdd45c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-06Fix ScaleKernel validate method.Pablo Marquez Tello
* Validate returns an error if the number of channels of the input tensor is not 1. With this change we generate an error if scale is called with any of these formats: Format::UV88, Format::RGB888, Format::RGBA8888,Format::YUV444, Format::YUYV422, Format::NV12, Format::NV21,Format::IYUV, Format::UYVY422 * Resolves ARMCL-631 Change-Id: If9d8b9d95332994920def55d8faae9dbf4213f79 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9579 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-17Move lut kernel to sve2 categorySiCong Li
This specific Lut kernel uses sve2 instructions Resolves: COMPMID-6268 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I44fa3812e96fa79b3d1e1e3a31d587581f59f0e1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9675 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-10Re-enable dyanmic weights in Neon™ depthwise convolutionRamy Elgammal
- Call Neon™ depthwise convolution validation inside in its configure() method. Resolves: COMPMID-6188 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib2ae4d995ff2bbc92ce4496d4ab93cf09113e3e9 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9594 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Fix im2col for fast-maths mode with padding.Renato Arantes
Following the investigation proposed by ONCPUML-1193, padding is implemented in im2col when the input channel is not a multiple of blocks requested by the weight format. Partially resolves: ONCPUML-1193 Signed-off-by: Renato Arantes <renato.arantes@arm.com> Change-Id: I350c7a1b2dcae63f8d94f5b6f1f86e948eab1f09 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9508 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Integrate multi-threaded pretranspose_B_arraySiCong Li
This is required for the case where rhs (B) is dynamic and needs to be pretransposed in every run. In a multi-threaded setting, this means the previously single-threaded pretranspose_B_array would become the bottleneck Resolves COMPMID-5896 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Id508c46992188a0f76a505152931d4955d04c16d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-19NETranspose 8x8 kernel for 32-bit elementsEthan Doe
The existing 4x4 tiling for 32-bit transpose is not efficient on aarch64, given that there are a lot more Neon registers available. So making the tile size to 8x8 will greatly improve NETranspose latency. For example, on AWS Graviton3 processors, with this change I have observed transposing a 768x768 matrix improves latency from 0.32ms down to 0.19ms. Improvement can also be seen across different matrix sizes. Further enlarging the tile size to 8x16 or 16x16 won't make it perform as good as 8x8 due to register pressure. This change is to mitigate the issue reported at: https://github.com/ARM-software/ComputeLibrary/issues/1045 Signed-off-by: Ethan Doe <yidoe@amazon.com> Change-Id: Ia09859cdf2f6d312e67219a9d95a3a3bf1db1999 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9448 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2023-04-17Fix unhandled case in ElementwiseUnaryRamy Elgammal
- Case: when the dequantized float value < 0.f the unary op was not called if operator is not LOG or RSQRT Resolves: COMPMID-5994 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I24d69db22042701f506188ace91ea4ab3dafeccf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9437 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-04-11Enable quantized data types for CpuElementwiseUnary on Armv7aRamy Elgammal
- Adding fallback functions neon_qasymm8_signed_elementwise_unary() and neon_qasymm8_elementwise_unary() - They would be called in case target is not aarch64 Resolves: COMPMID-5994 Change-Id: Id0db1e7cb0fe92f1eaef0b3a9ed2bea01b3f2a15 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9416 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Fix GCC13 compiler errorsPablo Marquez Tello
* Removed namespace arm_compute::utils::requires to fix the build error ‘requires’ is a keyword in C++20 [-Wc++20-compat] * Added missing includes for cstdint.h * Resolves MLCE-1040 Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Add quantized support for unary elementwise in CPUViet-Hoa Do
* Add quantized unary elementwise in CPU using LUT. * Widen the input data range of the test suite. - Fix CPU exponential function overflow/underflow range. - Fix saturation issue of CL round operator. Resolves: COMPMID-5763 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-23Round to nearest with ties to away from zero in ReluPablo Marquez Tello
* This patch adds support for rounding modes in vmlaq_qasymm8_signed which is used to compute Relu for quantized types * Partially resolves MLCE-1018 Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-01Add support for kernel indices in MaxpoolAdnan AlSinan
- Add a max pooling implementation that returns kernel indices. - Add a parameter in pooling info object to pick kernel indices impl. - Add validation tests. Resolves: [ONCPUML-1187] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-28Add an option to use lowest for max-poolingAdnan AlSinan
- Add a parameter in PoolingLayerInfo class to pick which value to use as min for max-pooling. Resolves: [ONCPUML-1166] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I34e1cccc15176bbf31523c61e99f3188ddca23e1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8989 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-08Update CPU kernels to remove x19 and w19Michael Tyler
Resolves: COMPMID-5805 Change-Id: Idf720bbb136474810086f5089c5ed23b3f79835a Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9081 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-02-08Add support for dilation > 1 in assembly DepthwiseConvolutionPablo Marquez Tello
* Resolve COMPMID-5689 Change-Id: I81a3791ad054db59562b76d1c729f2b2168aee8b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Signed-off-by: Andrew Mundy <andrew.mundy@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-03Fix armv7a failing GEMMConvolutionLayer testsMohammed Suhail Munshi
Resolves: [COMPMID-5854] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ib0228409be5e816acca7e123f2660eb01a79e38f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9078 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Fix GEMMLowp/Batched MatMul mismatches on CPUMohammed Suhail Munshi
- Fixes Column Offset matrix is not being iterated through in y dimension Resolves : COMPMID-5795 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I0190474be404b4f0e171855739cfd0a48cbed5bc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9020 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add new operator AddMulAdd for Neon™ backend for Float/Quantized typesGunes Bayir
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types. The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel). The inputs are - input1 : nD tensor [X, Y, Z, W, ..] - input2 : nD tensor [X, Y, Z, W, ..] - add_coef : 1D tensor [X] - mul_coef : 1D tensor [X] The outputs are - out1 : nD tensor (intermediate output) [X, Y, Z, W, ..] - out2 : nD tensor (final output) [X, Y, Z, W, ..] The operation can be summarized as follows: out1 <- input1 + input2 out2 <- Act(out1 * mul_coef + add_coef) The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr. The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward. Resolves: COMPMID-5463 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-18Revert "Update CPU kernels to remove x19"Michael Tyler
This reverts commit 3c59f01c209d2732a15d97d65565ead964787a8b. Resolves: COMPMID-5817 Change-Id: Ie2443a21854a95db1e3d0cafa2121c0187a5e237 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8974 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-16Update CPU kernels to remove x19Michael Tyler
Resolves: COMPMID-5805 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Change-Id: I250f64531e209625e4ff176dd5a552c1c34bc484 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8909 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-11Deprecated BF16 support in DepthConvertPablo Marquez Tello
* Removed BF16 validation tests for DepthConvert * Revert back to using inline assembly to convert to/from BF16 * Resolves COMPMID-5800 Change-Id: I803b2ad19ead297417f780c97c5b724cca6b394c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8929 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Use CPU quantized addition kernel for quantized subtractionOmar Al Khatib
Resolves : [COMPMID-5629] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I061ea5bdafa3a01e66ff869d158f26a38d19e125 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8835 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-21Fixed various mismatches in CpuCastKernelPablo Marquez Tello
* Fixes various mismatches when converting FP32 to BF16 and BF16 to FP32 * Fixed segfault when trying logging=1 and trying to log BF16 * Resolves MLCE-979 Change-Id: Ie517d0b7411b4e3a7fecdee588f0e073d290625a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8830 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-30Fix build error for unused variables in data type specific buildsGunes Bayir
The fp32 mws selection variables are guarded because they are flagged as unused when data type support does not include fp32. Resolves: COMPMID-5761 Change-Id: I7ac783e3d5ca51868b562ab879d03e02140b51a1 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8707 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-23ONCPUML-1072: Remove double definition of get_mws for Mul kernelfadara01
Signed-off-by: fadara01 <fadi.arafeh@arm.com> Change-Id: Ieaa2fa4a6a69e7e0a48633967dabe91c786b42b7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8682 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-22ONCPUML-1072: Tuned MWS values (for N1, V1) for binary operators used by oneDNNFadi Arafeh
Added approximate values for MWS for the following binary operators: Add, Sub, Mul, Min, Max, Div Change-Id: I5c4c75511129982a3f44c038ee272f09598469de Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/459609 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Signed-off-by: fadara01 <fadi.arafeh@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8392 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix regression caused by mws in ActivationLayerMohammed Suhail Munshi
- Regression is caused by the small default mws in ActivationLayer - Syncronization of threads takes longer than the workload on small sized tensors. - Size 1536 is chosen arbitrarily based on the size of tensors in benchmarked networks Resolves: [COMPMID-5655] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I02e865b578399e75484f471e67806dd4cf7502c0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/468454 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8615 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fixed Arm NN unit test failure caused by quantised multiplication patch.Omar Al Khatib
Resolves : [COMPMID-5698] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I21ce976473e0e8807c14989e98e68aca69c7f1f3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8603 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-09Fix CPU multiplication layer threading overheadViet-Hoa Do
* When the tensors are reinterpreted as 1D, any thing smaller than 10KB won't be splitted into different thread. Resolves: COMPMID-5630 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icff7089e37c85c8b325f099008a080a5805d36a2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8581 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-08SVE Hard-Swish via Lookup table for quantized inputPablo Marquez Tello
* Datatype supported qasymm8 and qasymm8_signed. * Resolves COMPMID-5211 Change-Id: Ib161af3af5ad11ec6b46de0bf4fbac172d2525fb Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7729 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-07Optimize CPU mul layer on quantized dataOmar Al Khatib
Resolves : [COMPMID-5461] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I89b99d267c32b00ef44f9bb6e7c714dfe4a0d29d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8420 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Fix fixed-point quantized additionViet-Hoa Do
* The range check condition is incorrect. The maximum value of 8-bit quantized data is 255, not 1023. * Use 256 to make the check slightly stricter than it should. Resolves: COMPMID-5458 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5c071680531574b409a7ce71a732f6480caa10d8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8537 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Add threshold for floating-point SOFT_RELU activationMilos Puzovic
Added missing threshold for calculating SOFT_RELU when SVE and CL implementations are used. As a result removed from the testing bounds for input values that were set to be in the interval [-40, 40]. Resolves: COMPMID-5658 Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I3d14df60125e36e4eb85aeb222f4fb0cc5741521 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8536 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-27Fix fixed-point quantized additionViet-Hoa Do
* Use the same rounding function for the left-over part with the vectorized part. Resolves: COMPMID-5640 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I07450b2a43390b77539b78cd5d3e6772bdc38548 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8520 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-20Update reinterpret tensor as 1D for CPU addViet-Hoa Do
* Use the same implementation as other layers. Resolves: COMPMID-5108 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5a50259b398b71ca1f61b5ee8daa539bf8263fac Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8501 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-20Add test in GEMMLowp for batch matmulMohammed Suhail Munshi
- Adds tests for batched matrix multiplication - Bugfix for issue : 3d tensors input tensors with offsets in GemmLowp results in mismatches Resolves : [COMPMID-5507] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I68e036fbca642c1841dd4321033045aadc8f5636 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/461298 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8482 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-19Fix FFTConvolutionLayer testViet-Hoa Do
* Do not change the tensor info after configure stage. * By fixing this, the 1D optimization for activation layer can be applied to all data types and tensor layout. Resolves: COMPMID-5644 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I557f9bb84e5e456c28d6b423584887d7a3648ad4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8470 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-12Optimize Neon™ Logistic ActivationMohammed Suhail Munshi
- Use a 1d execution window to improve memory access pattern. Resolves: [COMPMID-5465] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ida30669ffa06eb002ca43a6edf15e25a6eaad2f6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8344 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-10Fix LUT-based activation layerViet-Hoa Do
* Use the window instead of the tensor shape to determine the number of elements in the x-dimension. * Remove the LUT implementation in 32-bit build. Resolves: COMPMID-5641 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0a79aa38d8f6a105ad01785bd94571f5a2ecb348 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8380 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2022-10-07Optimize Neon™ SUB operator by squashing execution windowJakub Sujak
Resolves: COMPMID-5462 Change-Id: I2c7151c8faf4016cc33592fff04d492d7cbc8fd6 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8366 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-03Fix Batch Matmul nightly failureAdnan AlSinan
Resolves COMPMID-5601 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I1baf92c4751d784d017c0b2f7de1fc09e42ce69c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8309 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-03Optimize CPU add layer on quantized dataViet-Hoa Do
* Use fixed-point arithmetic where possible. * Various optimization for the FP32-based implementation. This implementation is kept as the fall-back solution in case of unrealistic quantization parameters that exceed the range of fixed-point solution. Resolves: COMPMID-5458 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I221d2d3801ecaae4fe0b7cf6ae8ef00ca3743665 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8317 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-26Add FP32 Neon™ swish activationJonathan Deakin
Change-Id: Id37b59adbc8c4cbe218d1652aeb02a0b4ce42c66 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8256 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-22Fix unresolved symbol for target armv7a + AndroidPablo Marquez Tello
* Resolves COMPMID-5599 Change-Id: I4c1df48eda289c82ca567f6808fccd0b09065223 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8302 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>