aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2022-09-09Add a macro guard in all OpenCL kernels in gemmlowp.clGian Marco Iodice
Resolves COMPMID-5498 Change-Id: I474f3f963257014255d082aab0ccbe3efe5aa067 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8222 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-08Disable Winograd on fp16 if fast-math = falseRamy Elgammal
- That would force CpuConv2d::get_convolution_method() choose GEMM_CONV2D or GEMM methods instead. Resolves: COMPMID-5531 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I4dec8772e8c150da003d9a89c1d036057c4d28b0 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8233 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02F16 Specialization for MeanStdDevNormMurray Kornelsen
Ran into issues with f16 meanstddevnorm. Essentially, with large enough tensors and/or large values in tensors, output becomes all 0. This is due to the variance computation. In f16, it reaches infinity quite easily, then the division results in 0. This change modifies the OpenCL and NEON implementations to compute the sum of squares and the variance using f32, while other operations remain f16. Update: Found that the square operation also benefits from f32, rather than squaring in f16 and accumulating f32. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: Ide00afd84ec6d26fec4d53b073e295814f08ba46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7959 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02Enable Winograd-based conv2d when IFM>=8 on GpuGian Marco Iodice
From an internal performance evaluation, it seems that Winograd-based Conv2D offers better performance than alternative methods such as direct convolution and gemm-based conv already from IFM=8. Before the condition was for IFM>=16 Resolves COMPMID-5532 Change-Id: I9ff04835d6fd07f5f0abeec9645c9d9cc913b6b7 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8147 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-01Use parent buffer in CLSubTensor. This avoids calling enqueueMapBuffer ↵Murray Kornelsen
repeatedly when mapping multiple children of the same parent. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I7ff554915a37320dfd94f15a6fb01a72a235cf39 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7920 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-24Fix add for tensors with non-matching stridesJonathan Deakin
Previously, the add_as_1d_array kernels were used on tensors with non-matching strides which caused the wrong elements to be added. The fix is to check that the strides are equal when selecting the addition kernel. Change-Id: I914ca2b95e5b8ed1875ec5ebe129bdfe2845496b Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8120 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-24Fix validation problem in CLQLSTMLayerPablo Marquez Tello
* Resolves MLCE-799 Change-Id: I3d5b2afbf65d159aa5e645743c1e139110dfc20e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8093 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-23Fix macos build errorsPablo Marquez Tello
* Resolves MLCE-903 Change-Id: I39fb3b4b395a37c0f32481830f94d85ec15e205f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8124 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-18Use Neon™ kernels for FP Bilinear Resize for SVEGunes Bayir
Removes FP Bilinear SVE kernels and uses Neon™ kernels instead Resolves: COMPMID-5449 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8e01de44bd884cb6578ca0b9358509b69bc31ca2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8100 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-17Revert "Fix performance regression in ClConv2D"Ramy Elgammal
- Reverting commit e54d8c07e75d70baeb80fecbb43088027ea45658 because it has caused unexpected regressions. Resolves: COMPMID-5504 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I2a0bcc6a311009a81f20a146079758ad138fff5b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8092 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-17Add LUT for quantized sigmoid functionViet-Hoa Do
* Move LUT implementation to a seperate file. It will be used for both QASYMM8 and QASYMM8_SIGNED. * Fix wrong constant value related to QASYMM8_SIGNED leaky ReLU in 32-bit build. Resolves: COMPMID-5464 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2b24d52409a38f1b66fd532f431eff8a9e4547b6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8066 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-16Fix performance regression in ClConv2DGian Marco Iodice
- Call gemm-based convolution when the kernel is large and the stride is unit Resolves: COMPMID-5504 Change-Id: I8dd83bd012000ed76824b96a8e37c98c861c59e4 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8081 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-11Fix performance regression in Conv2D on OpenCLAdnan AlSinan
- Affect CL backend. - For FP32 datatype, affect different platforms. Resolves COMPMID-5479 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I705d718bc9b7def218034958f7ef86f2c2abe06d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8064 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-11Disable unsafe FP optimizations in Winograd Output TransformGunes Bayir
The unsafe FP optimizations flag causes accuracy issues when certain conditions are met regarding the hardware type, data type and the activation function. Resolves: COMPMID-5375 Change-Id: I1b0b06549b8c108617962d006a20dd263d5e3c21 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8061 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-11Fix CTS/SLTS failure related to Depthwise ConvolutionGunes Bayir
The issue is caused by GPUTarget not being set explicitly for the Depthwise convolution kernel, but it's being used in its build configuration. This causes the default value to be used and enables some unsafe FP optimizations. Resolves: COMPMID-5490 Change-Id: I5300a1168962cacb62cf49db795f052cf6740c7e Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8059 Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-08Fix for AI benchmark ResNet regressionViet-Hoa Do
* For 3x3 kernel, only choose the implementation with larger tile size if the input tensor is larger than the tile. Resolves: COMPMID-5467 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2cf95ddb25f477cb05da3b3501e0afe9548fc33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8022 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-05Fix LeNet-f16 convolution regressionAdnan AlSinan
Resolves COMPMID-5420 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I24dca916f49f82e7e5ec809500ae5fe32c8adc97 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8020 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-04[ONCPUML-970] Fast math mode for fixed format kernelsPablo Marquez Tello
Minor tweaks and test for running fixed format kernels with BF16 operations when specified by the user. Change-Id: Ic8167f67b86b1298da65e46cfebed9f3b86940e4 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8000 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-03Add Dynamic Fusion Tests with BugFixesMohammed Suhail Munshi
- Allow fusing arbitrary number of existing elementwise operators - Fix issues with 3D and 4D tensors in Elementwise Addition and Floor components - Collapse the 3D/4D window in the same way as that used by Conv2d, i.e. collapse dim 1 and dim 2 together - Fix Floor component issues when used after other components - Add Dynamic Fusion Tests (Floor + Div, Conv2d + Add + Div) - Add Addition ElementWise Broadcasting Test Resolves: [COMPMID-5356] Change-Id: I58b93a90175bb0440d43531d18cac94b5f5c2689 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/433956 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7957 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-03[ONCPUML-968] Fixed format kernel support in additional APIsMilos Puzovic
Implements required plumbing in order to be able to ask and execute fixed format kernels from NEFullyConnected, NEGEMM and NEGEMMConv2d. These APIs are used to accelerate oneDNN primitives (inner product, matrix multiplication and indirect GEMM respectively) and without changes it would not be possible to call fixed format kernels from those oneDNN primitives. Change-Id: I27534f0491ce28d0ccb98c19f318bd33dcdf2ff5 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7999 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-02Update the GPUTarget listGian Marco Iodice
Resolves COMPMID-5405 Change-Id: I995b5e79bef13529097ed17f7854763a4cf89272 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7986 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-01Optimize add layer by considering the input tensors as 1D arrayGunes Bayir
Resolves: COMPMID-5108 Change-Id: I544f8160fbe5b4ffbef348d1fbd3dd626a6e1bdb Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8002 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-01Fix for OpenMP scheduler work breakdownMilos Puzovic
If number of work items is greater than number of available threads then OpenMP scheduler will only execute as many work items as there are threads. This fix makes sure that we iterate through all work items and execute all of them. Change-Id: I3ad4b732c01fadc70dacaf09af3007d2b31086c7 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8001 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2022-07-27Fix compilation error rasied in Nightly_NEWRamy Elgammal
- related to 91780021e2 Fix for inclusion of "arm_gemm" - arm_compute::WeightFormat was passed to a function expecting arm_gemm::WeightFormat Resolves: COMPMID-5415 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib53fad2eab0148f466a5e2f11b931754569da3d6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7989 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-07-26Fix for inclusion of "arm_gemm" from src into "Types.h" from coreRamy Elgammal
- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat when needed through two map function. - Moved to_string(WeightFormat) to TypePrinter.h Resolves: COMPMID-5415 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Sicong Li <sicong.li@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-25Enable march=armv8.6-a in non multi-isa buildsPablo Marquez Tello
* scons arch=armv8.6-a translates to -march=armv8.6-a * scons arch=armv8.6-a-sve translates to -march=armv8.6-a+sve * scons arch=armv8.6-a-sve2 translates to -march=armv8.6-a+sve2 * Resolves COMPMID-5408 Change-Id: I0901e1de864d00109759509af7cc2b5c9ae1cd75 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7943 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-22Add GemmLowp MMUL Reshaped Only Rhs Support for QASYMM8/QASYMM8_SIGNEDFreddie Liardet
This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5398 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-22Update ClConv2D heuristic to use direct convolutionAdnan AlSinan
Resolves COMPMID-5298 Change-Id: I4a7d788bc1f5f568bedcc22e7aca47ede6de71bf Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7891 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-21Fix direct convolution cases that were failing on OdroidAdnan AlSinan
- Affects OpenCL backend. - Resolves COMPMID-5416 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-19[ONCPUML-951] Variable weight support for Convolution.Francesco Petrogalli
API changes for NEGEMMConvolutionLayer and CpuGemmConv2d Built with: scons neon=1 opencl=0 os=linux arch=armv8.2-a multi_isa=1 \ build=native -j32 Werror=false validation_tests=1 build_dir=opt \ standalone=1 asserts=1 experimental_fixed_format_kernels=1 . Tested with: ./build/opt/tests/arm_compute_validation Hardware where the test executable was run: Neoverse N1 Test coverage: * NEGEMMConvolutionLayer, CpuGemmConv2d * NHWC (the only one supported by the fixed-format kernels) * F16, F32 * Shapes: RunSmall Change-Id: I4fd3e495a7cbf61210ea02d37440ba9652934e99 Signed-off-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7632 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-18Fix Neoverse V1 heuristics for FP32 fast moderamelg01
Resolves: COMPMID-5401 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I432b304483236efd392dfc47d541e6759c135104 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7934 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2022-07-14Integrate new winograd APIs from MLTechramelg01
Resolves: COMPMID-5400 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib4428436dd7a6e40d8b2d8a2f8dac1b079154551 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7894 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-13Add Gemm MMUL Reshaped Only Rhs Support for FP32/FP16Gunes Bayir
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5216 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-13Fixed clang-cl errors on Windows native builds.Pablo Tello
Partially resolves MLCE-739 Change-Id: Ice06a96d6a8a26b31e334ba4e697cd41d352b026 Signed-off-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7364 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-07-08Extended direct conv 2d interface for tuning the OpenCl kernelGian Marco Iodice
Resolves COMPMID-5298 Change-Id: Ie9b907e5dcf86aa6add8d08799fa7ba7c264edea Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7888 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-07Add missing flag when building cl graph examples and fixMichalis Spyrou
incorrect cl cache behaviour Resolves: COMPMID-5286 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Change-Id: I1aa82825ef7d31d7830a00282d1a3523ccf1d746 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7883 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-05Add G57 to GPUTargetSiCong Li
Relates to: COMPMID-5299 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I19f820f698cf11020da019f4a1334cccb1e40c7e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7880 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-04Fix build errors on armv8.6 SVE2 with NDK 23 and 24Michalis Spyrou
Extensive use of templates resulted in a compiler crash on NDK 23 and 24. This rework solves the issue and also reduces the library size by 101Kb. Resolves: COMPMID-5384 Change-Id: I9c5c68c5e36f236b0891e44d25478743417fb16d Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7871 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-01Fix OpenBSD build errorsPablo Marquez Tello
* Partially resolves COMPMID-5396 Change-Id: I7f3e692e73da548a901a07ed6d7212df40b8c26f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7870 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-30Wrong arguments for running activation function in CpuGemmDirectConv2dMichalis Spyrou
Resolves: COMPMID-5350 Change-Id: I2f5021ebe9fe4c71371f2cd6cb65cc9c0b2110b5 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7861 Reviewed-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-29Add LUT-based leaky relu for QASYMM8 on CPUViet-Hoa Do
* Add LUT generation function for Leaky ReLU. * Some additional changes in the existing LUT implementation: + Bring back the NEON implementation of hard swish for 32-bit build. Library size of 64-bit build is not affected. + Add some extra #ifdef to remove unnecessary code in 32-bit build. Resolves: COMPMID-5386 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I1ea49611cc922765ee741e31138c888401d33e9b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7845 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-06-28Fix OpenCL Winograd output transformGian Marco Iodice
- num_tiles_x was not initialized when we were passing the argument at compilation time Resolves COMPMID-5394 Change-Id: I6004a2d47656edda3c832f980c78622045de7068 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7857 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-06-27Implement new Elementwise Dynamic Fusion Operators: Div, FloorMichalis Spyrou
Resolves: COMPMID-5355 Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-06-24Improve LUT Neon Hard-SwishPablo Marquez Tello
* Changed window_step from 16 to tensor_shape().x() when calling into the assembly byte substitution code. * Resolve COMPMID-5211 Change-Id: I5c1f5273455999bb35f94c76a8afb4290e728858 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7843 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2022-06-23Select neon LUT Hard-Swish kernel on all devicesPablo Marquez Tello
* Resolves COMPMID-5211 Change-Id: I560ab2992c6089774c7ebee3538847905521607d Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7840 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2022-06-23Enable loading OpenCL symbols in Android App native code, NDK environment.ohadagoogle
* Based on ArmNN work: IVGCVSW-6960 Device not found running MLTS with Support Library Change-Id: I8cb8acc30c8a4afa60bc1cf80eb5b6b2c43dfdc1 Signed-off-by: Ohad Almagor <ohada@google.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7836 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-16Fix SVE2 implementation of quantized SoftMax 1DViet-Hoa Do
* Fix integer overflow in substraction step. * Fix incorrect vector when convert the result to qasymm8_signed. Resolves: COMPMID-5389 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Id745f2eb2a1b0823b02b136560351b5f8fb85624 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7738 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-15Fix build error v8.2-a-svePablo Marquez Tello
* bf16 instrinsics should be used when __ARM_FEATURE_SVE_BF16 is present * Fixed NDK14 compiler warning declaring copy ctor for Window explicitly * Resolves MLCE-867 Change-Id: I84ac5f213d9700e2fda7da55d83bba7cf79ad52c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7728 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-06-15Add support OpenCL 3.0 non-uniform workgroupViet-Hoa Do
* Add OpenCL version 3 detection. * Use -cl-std=CL3.0 build option to support non-uniform workgroup when OpenCL 3 is detected and the feature is supported. Resolves: COMPMID-5208 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ifd8cbae6b34228c07e761bcb94ee8f35bdf1bace Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7655 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>