ComputeLibrary.git -

Age	Commit message (Collapse)	Author
2022-09-13	Add test case for disable Winograd on fp16 if fast-math = false	Ramy Elgammal
	Resolves: COMPMID-5531 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Id1a5f909d64e897cdef2e920240ac5ae244166b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8242 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-12	Add test for NEGEMM to test a batched matrix multiplication with variable ↵	Adnan AlSinan
	input tensors - Add a test for CPU to batched matrix multiplication with variable input tensors - Disable assembly kernel when using _reshape_b_only_on_first_run flag Resolves COMPMID-5501 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: If96b182584617806a9dfe597dbfaf05241b123c2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8234 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-09	Rework heuristic in ClConv2d	Gian Marco Iodice
	The heuristic has been tweaked to call direct convolution when we think it can be faster than gemm-based convolution. The main change is affecting the selection of the convolution method on the first layer. In general, the question we should ask for the first convolution layer of a model is: when the execution time of im2col + gemm < direct?. Since im2col does not depend on the OFM, it means that when OFM is big enough, the contribution of im2col is small and the GEMM approach is preferable. From internal experiments, the OFM threshold is 64. Resolves COMPMID-5504, COMPMID-5504, COMPMID-5477 Change-Id: If1bd1fa93c185ffa874388e29866244e62ca3494 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8231 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-09-09	Optimize FP32/16 Bilinear Scale Kernel for Neon™	Gunes Bayir
	This patch removes index and weight pre-computations where it's not used and reduces some calculations inside the inner-most loop of Scale. Resolves: COMPMID-5452 Change-Id: Ie149b1b76a90a8cb659ada0f97aef78caf69932f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-09-09	Add a macro guard in all OpenCL kernels in gemmlowp.cl	Gian Marco Iodice
	Resolves COMPMID-5498 Change-Id: I474f3f963257014255d082aab0ccbe3efe5aa067 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8222 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-08	Disable Winograd on fp16 if fast-math = false	Ramy Elgammal
	- That would force CpuConv2d::get_convolution_method() choose GEMM_CONV2D or GEMM methods instead. Resolves: COMPMID-5531 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I4dec8772e8c150da003d9a89c1d036057c4d28b0 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8233 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2022-09-07	Optimize depthwise convolution on OpenCL	Gian Marco Iodice
	The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-07	Add test for CLGEMM to test a batched matrix multiplication with variable ↵	Mohammed Suhail Munshi
	input tensors Resolves : [COMPMID-5502] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ida001dc597973f9180468737a3e32e5022e6baee Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/450342 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8224 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02	F16 Specialization for MeanStdDevNorm	Murray Kornelsen
	Ran into issues with f16 meanstddevnorm. Essentially, with large enough tensors and/or large values in tensors, output becomes all 0. This is due to the variance computation. In f16, it reaches infinity quite easily, then the division results in 0. This change modifies the OpenCL and NEON implementations to compute the sum of squares and the variance using f32, while other operations remain f16. Update: Found that the square operation also benefits from f32, rather than squaring in f16 and accumulating f32. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: Ide00afd84ec6d26fec4d53b073e295814f08ba46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7959 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02	Enable Winograd-based conv2d when IFM>=8 on Gpu	Gian Marco Iodice
	From an internal performance evaluation, it seems that Winograd-based Conv2D offers better performance than alternative methods such as direct convolution and gemm-based conv already from IFM=8. Before the condition was for IFM>=16 Resolves COMPMID-5532 Change-Id: I9ff04835d6fd07f5f0abeec9645c9d9cc913b6b7 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8147 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-01	Compute Hard-Swish with a Lookup table for qasymm8_signed.	Pablo Marquez Tello
	* Resolves COMPMID-5211 Change-Id: I7cc72662bb1cf52bf112685639d3dbba33d1333f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7993 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-09-01	Use parent buffer in CLSubTensor. This avoids calling enqueueMapBuffer ↵	Murray Kornelsen
	repeatedly when mapping multiple children of the same parent. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I7ff554915a37320dfd94f15a6fb01a72a235cf39 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7920 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-24	Fix add for tensors with non-matching strides	Jonathan Deakin
	Previously, the add_as_1d_array kernels were used on tensors with non-matching strides which caused the wrong elements to be added. The fix is to check that the strides are equal when selecting the addition kernel. Change-Id: I914ca2b95e5b8ed1875ec5ebe129bdfe2845496b Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8120 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-24	Fix validation problem in CLQLSTMLayer	Pablo Marquez Tello
	* Resolves MLCE-799 Change-Id: I3d5b2afbf65d159aa5e645743c1e139110dfc20e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8093 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-23	Fix macos build errors	Pablo Marquez Tello
	* Resolves MLCE-903 Change-Id: I39fb3b4b395a37c0f32481830f94d85ec15e205f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8124 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-18	Use Neon™ kernels for FP Bilinear Resize for SVE	Gunes Bayir
	Removes FP Bilinear SVE kernels and uses Neon™ kernels instead Resolves: COMPMID-5449 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8e01de44bd884cb6578ca0b9358509b69bc31ca2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8100 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-17	Revert "Fix performance regression in ClConv2D"	Ramy Elgammal
	- Reverting commit e54d8c07e75d70baeb80fecbb43088027ea45658 because it has caused unexpected regressions. Resolves: COMPMID-5504 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I2a0bcc6a311009a81f20a146079758ad138fff5b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8092 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-17	Add LUT for quantized sigmoid function	Viet-Hoa Do
	* Move LUT implementation to a seperate file. It will be used for both QASYMM8 and QASYMM8_SIGNED. * Fix wrong constant value related to QASYMM8_SIGNED leaky ReLU in 32-bit build. Resolves: COMPMID-5464 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2b24d52409a38f1b66fd532f431eff8a9e4547b6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8066 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-16	Fix performance regression in ClConv2D	Gian Marco Iodice
	- Call gemm-based convolution when the kernel is large and the stride is unit Resolves: COMPMID-5504 Change-Id: I8dd83bd012000ed76824b96a8e37c98c861c59e4 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8081 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-12	Fix note in guidelines doc	Ramy Elgammal
	Partially Resolves: COMPMID-5346 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I26212bfb46cd451ef01956ade0a306bbdbf5940e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8069 Reviewed-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-12	Update release notes about armv8.6 build flag change	Ramy Elgammal
	Partially Resolves: COMPMID-5346 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ida1ad7ff19837203b3366466ba8018386cd6b18f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8065 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-11	Fix performance regression in Conv2D on OpenCL	Adnan AlSinan
	- Affect CL backend. - For FP32 datatype, affect different platforms. Resolves COMPMID-5479 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I705d718bc9b7def218034958f7ef86f2c2abe06d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8064 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-11	Disable unsafe FP optimizations in Winograd Output Transform	Gunes Bayir
	The unsafe FP optimizations flag causes accuracy issues when certain conditions are met regarding the hardware type, data type and the activation function. Resolves: COMPMID-5375 Change-Id: I1b0b06549b8c108617962d006a20dd263d5e3c21 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8061 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-11	Update README	Ramy Elgammal
	Partially Resolves: COMPMID-5346 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I41755295450b6ee698d8998d8a6d6bf9d4e4e7a9 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/443006 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sicong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8052 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-11	Fix CTS/SLTS failure related to Depthwise Convolution	Gunes Bayir
	The issue is caused by GPUTarget not being set explicitly for the Depthwise convolution kernel, but it's being used in its build configuration. This causes the default value to be used and enables some unsafe FP optimizations. Resolves: COMPMID-5490 Change-Id: I5300a1168962cacb62cf49db795f052cf6740c7e Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8059 Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-08	Fix for AI benchmark ResNet regression	Viet-Hoa Do
	* For 3x3 kernel, only choose the implementation with larger tile size if the input tensor is larger than the tile. Resolves: COMPMID-5467 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2cf95ddb25f477cb05da3b3501e0afe9548fc33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8022 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-08	Update Errata	Ramy Elgammal
	Resolves: COMPMID-5345 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I2fe4cb16bddf0f6a25906ab471e0a684d9ddf0ec Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8050 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2022-08-05	Update SONAME_VERSION in SConscript	Ramy Elgammal
	Partially Resolves: COMPMID-5346 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1740d3244d7fcaf58613bbb508db7b46ba13f19e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8045 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-05	Fix LeNet-f16 convolution regression	Adnan AlSinan
	Resolves COMPMID-5420 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I24dca916f49f82e7e5ec809500ae5fe32c8adc97 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8020 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-04	[ONCPUML-970] Fast math mode for fixed format kernels	Pablo Marquez Tello
	Minor tweaks and test for running fixed format kernels with BF16 operations when specified by the user. Change-Id: Ic8167f67b86b1298da65e46cfebed9f3b86940e4 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8000 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-03	Add Dynamic Fusion Tests with BugFixes	Mohammed Suhail Munshi
	- Allow fusing arbitrary number of existing elementwise operators - Fix issues with 3D and 4D tensors in Elementwise Addition and Floor components - Collapse the 3D/4D window in the same way as that used by Conv2d, i.e. collapse dim 1 and dim 2 together - Fix Floor component issues when used after other components - Add Dynamic Fusion Tests (Floor + Div, Conv2d + Add + Div) - Add Addition ElementWise Broadcasting Test Resolves: [COMPMID-5356] Change-Id: I58b93a90175bb0440d43531d18cac94b5f5c2689 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/433956 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7957 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-03	[ONCPUML-968] Fixed format kernel support in additional APIs	Milos Puzovic
	Implements required plumbing in order to be able to ask and execute fixed format kernels from NEFullyConnected, NEGEMM and NEGEMMConv2d. These APIs are used to accelerate oneDNN primitives (inner product, matrix multiplication and indirect GEMM respectively) and without changes it would not be possible to call fixed format kernels from those oneDNN primitives. Change-Id: I27534f0491ce28d0ccb98c19f318bd33dcdf2ff5 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7999 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-02	Update the GPUTarget list	Gian Marco Iodice
	Resolves COMPMID-5405 Change-Id: I995b5e79bef13529097ed17f7854763a4cf89272 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7986 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-08-01	Optimize add layer by considering the input tensors as 1D array	Gunes Bayir
	Resolves: COMPMID-5108 Change-Id: I544f8160fbe5b4ffbef348d1fbd3dd626a6e1bdb Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8002 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-08-01	Fix for OpenMP scheduler work breakdown	Milos Puzovic
	If number of work items is greater than number of available threads then OpenMP scheduler will only execute as many work items as there are threads. This fix makes sure that we iterate through all work items and execute all of them. Change-Id: I3ad4b732c01fadc70dacaf09af3007d2b31086c7 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8001 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2022-08-01	Fix building failure with validate_examples set to true	Michalis Spyrou
	Resolves: COMPMID-5402 Change-Id: Ic4abda36c475c3a7d86fb469d7ed1b23a62ea182 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7991 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-29	Updated documentation	Pablo Marquez Tello
	* Fixed an error caused by a newline missing in line 187 * Added section about building natively on Windows on ARM * Resolves MLCE-739 Change-Id: I2f452d77247b2a264e7f122d97cbb8f288716971 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7992 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-27	Fix compilation error rasied in Nightly_NEW	Ramy Elgammal
	- related to 91780021e2 Fix for inclusion of "arm_gemm" - arm_compute::WeightFormat was passed to a function expecting arm_gemm::WeightFormat Resolves: COMPMID-5415 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib53fad2eab0148f466a5e2f11b931754569da3d6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7989 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-07-26	Fix build android build error	Pablo Tello
	* Resolves COMPMID-5422 Change-Id: Ib59d85aead3f8f9559392957fbc42a4ddbb27b07 Signed-off-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/440126 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7987 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-07-26	Fix for inclusion of "arm_gemm" from src into "Types.h" from core	Ramy Elgammal
	- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat when needed through two map function. - Moved to_string(WeightFormat) to TypePrinter.h Resolves: COMPMID-5415 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Sicong Li <sicong.li@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-25	Enable integrated assembler for Android™ 13 and onwards only	SiCong Li
	Since Android™ 13, the underlying NDK has deprecated system assembler and favors integrated assembler instead Please also see https://github.com/android/ndk/wiki/Changelog-r23 Resolves COMPMID-5410 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib9fa8a22f6cdb5cb4f0095e15bf332484f709630 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7956 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-25	Enable march=armv8.6-a in non multi-isa builds	Pablo Marquez Tello
	* scons arch=armv8.6-a translates to -march=armv8.6-a * scons arch=armv8.6-a-sve translates to -march=armv8.6-a+sve * scons arch=armv8.6-a-sve2 translates to -march=armv8.6-a+sve2 * Resolves COMPMID-5408 Change-Id: I0901e1de864d00109759509af7cc2b5c9ae1cd75 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7943 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-25	Mention Arm® Neoverse® in supported systems.	Pablo Marquez Tello
	* Resolves MLCE-888 Change-Id: I90291cb5f6eddbb889e86dd3295ff8b05b1e1311 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7953 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2022-07-22	Add GemmLowp MMUL Reshaped Only Rhs Support for QASYMM8/QASYMM8_SIGNED	Freddie Liardet
	This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5398 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-22	Update ClConv2D heuristic to use direct convolution	Adnan AlSinan
	Resolves COMPMID-5298 Change-Id: I4a7d788bc1f5f568bedcc22e7aca47ede6de71bf Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7891 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-21	Fix direct convolution cases that were failing on Odroid	Adnan AlSinan
	- Affects OpenCL backend. - Resolves COMPMID-5416 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-21	Added CONTRIBUTING.md	Pablo Marquez Tello
	* Pointing out that development and contribution must be done on linaro * Asking not to submit pull requests on github * Resolves MLCE-774 Change-Id: Ifbcf4a7ab8e714b5c77425f0e4174520e0ece09f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7946 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-20	Remove data extraction scripts	Pablo Marquez Tello
	* Resolved MLCE-886 Change-Id: I3b8fbe662c715b82c08c63fa27892471a572fdd8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7945 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-07-19	[ONCPUML-951] Variable weight support for Convolution.	Francesco Petrogalli
	API changes for NEGEMMConvolutionLayer and CpuGemmConv2d Built with: scons neon=1 opencl=0 os=linux arch=armv8.2-a multi_isa=1 \ build=native -j32 Werror=false validation_tests=1 build_dir=opt \ standalone=1 asserts=1 experimental_fixed_format_kernels=1 . Tested with: ./build/opt/tests/arm_compute_validation Hardware where the test executable was run: Neoverse N1 Test coverage: * NEGEMMConvolutionLayer, CpuGemmConv2d * NHWC (the only one supported by the fixed-format kernels) * F16, F32 * Shapes: RunSmall Change-Id: I4fd3e495a7cbf61210ea02d37440ba9652934e99 Signed-off-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7632 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-18	Fix multi_isa build failure after Winograd integration	ramelg01
	Partially Resolves: COMPMID-5400 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I9ed6b85b105a15b1a35c6a921f63ea94c665438b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/437221 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7933 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>