ComputeLibrary.git -

Age	Commit message (Collapse)	Author
4 days	Change reorder implementation to be vector length agnostic for OHWIo8 reorder	Radu Salavat
	As the reorder kernel is called with WeightFormat OHWIo8 for hardware that does not support it e.g. vector length 128, adapt the test case and add kernel implementation for this edge case. This fixes the mismatching values that appear when OHWIo8 fixture was run with 128 vector length. Resolves: ONCPUML-1523, COMPMID-6281 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: Iaa1a3b486d1725a2d6031051aa544082c1bbe913 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11421 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
6 days	Add fp16 and integer data type support for ScatterNd in Gpu	Gunes Bayir
	Resolves: COMPMID-6899 Change-Id: I3743f2c9e5c21e1ec9f4c81d08c148666afad33a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11505 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
7 days	Disable SME2 Gemmlowp s8f32 kernel selection in case results needs to be ↵	Gunes Bayir
	accumulated Similar to https://review.mlplatform.org/c/ml/ComputeLibrary/+/11500, s8f32 kernels do not support accumulate mode. This patch modifies the kernel selection and also adds more tests to stress these test cases better. Partially Resolves: COMPMID-6995 Change-Id: I40e19446c012eb7334e4511e254cce0d635aa234 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11503 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Radu Salavat <radu.salavat@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
11 days	Add update/index/output (m+1)/2d/(m+n) support for CLScatter	Gunes Bayir
	Resolves: COMPMID-6894, COMPMID-6896 Change-Id: I9d29fd3701a7e0f28d83f81a6c42a7234c2587c3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11477 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
11 days	Move s32 to f32 conversion in reference layers from quantization to ↵	Radu Salavat
	dequantization Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: Ib17946b526d35deeca94b5d2f163b92101e313c4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11420 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
11 days	Add memory stress tests for per channel quantized convolution	Gunes Bayir
	Partially Resolves: MLCE-1255 Change-Id: Ibadcfedd43530232c65f05e571bc8b4568a63e67 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11499 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
12 days	Fix compiler error in the validation tests	Pablo Marquez Tello
	* Building with openmp=1 cppthreads=0 caused a linker error in the validation suite Change-Id: I16d8a49e9190cd1288237d82583a0034e20a9f38 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11483 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
14 days	Multi-Dimensional and Batched Scatter Reference and Dataset Implementation.	Mohammed Suhail Munshi
	Resolves: [COMPMID-6893, COMPMID-6895, COMPMID-6898] Change-Id: I355f46aeba2213cd8d067cac7643d8d96e713c93 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11430 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
14 days	Scatter GPU Kernel Implementation for 1D tensors.	Mohammed Suhail Munshi
	Resolves: [COMPMID-6891, COMPMID-6892] Change-Id: I5b094fff1bff4c4c59cc44f7d6beab0e40133d8e Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11394 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-16	Fix v7 test failure when core matmul result is dequantized into fp32	Gunes Bayir
	Partially Resolves: ONCPUML-1444, MLINFSW-439 Change-Id: Ic7498d6944df2848f3e82eaf4e11cc5cb6ef5754 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11424 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-04-15	Add s8f32 kernels and dynamic QuantizationInfo	Jonathan Deakin
	- Add support for QASYMM_SIGNED*QASYMM8_SIGNED->F32 in CpuGemmLowpMatrixMultiplyCore - Add s8f32 kernel using existing s8->s32 kernels with a new DequantizeFloat OutputStage, the structure is similar to Requantize32 but the opposite way around. - Add SME s8f32 kernels with integrated support for DequantizeFloat. - Add scale to CpuGemmLowpOffsetContributionKernel. - Add virtual dequantize scale to gemm_common, only implemented for gemm_interleaved. - Update year to 2024 in generate_build_files. - Add dynamic flag to QuantizationInfo which signals to operators that it can change after configuration - Add support for dynamic quantization in NEGEMMLowpMatrixMultiplyCore - Add dynamic quantization fixture by extending GEMMLowpGenericMatrixMultiplyCoreValidationFixture - Add GEMMLowpDequantizedMatrixMultiplyValidationFixture - Store k (number of cols of A) rather than k_offset in the offset contribution kernels so that we can recompute it when the other offsets change relates to: ONCPUML-1444 MLINFSW-439 Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Co-authored-by: David Mansell <David.Mansell@arm.com> Change-Id: I58a3acf2c09289a303e52eea6b336a696a5bc8da Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11022 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-15	Add guarding for accumulation validation test in aarch32	Radu Salavat
	Partially Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: I681df5e9c399996fbc7dc362b906af151588ca44 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11416 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2024-04-12	Runtime checks for bf16 fixed format tests	David Svantesson-Yeung
	Add checks for bf16 support for bf16 fixed format tests. This ensures tests pass in multi_isa setting where library was compiled with bf16 support, even on systems that do not support bf16. Also adds runtime check to GEMMConvolutionLayer/Float/BFLOAT16/RunSmall. Resolves: COMPMID-6922 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Change-Id: Ic0f09ba34b5a2c64be8bfc848a4457a6b1c4d1c3 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11408 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-12	Accumulation in Cpu Gemm kernels is not supported for quantized kernels in ↵	Radu Salavat
	aarch32. This patch guards the relevant tests. Partially Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: I8eed80db4b522185c3c50c13f0f701aa48961057 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11410 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-11	Add SME2 implementation of softmax for FP16	Gunes Bayir
	In addition to the softmax kernel, this patch fixes minor issues in the fp32 implementation. Resolves: COMPMID-6920 Change-Id: Ibbd9f0af5f2a93fba0e92d72ba437279c34149d3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11402 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-11	Add in place summation to CPU GEMM kernels	Radu Salavat
	Instead of dispatching the sum postop for GEMM kernels to a separate kernel + add, that requires an extra destination sized allocation, plus 3 extra load/stores per element, just do it in the GEMM kernel. Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Co-authored-by: Milos Puzovic <milos.puzovic@arm.com> Change-Id: I7a1f2da3300875fa1ac88b705a34390969518077 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-09	Specify absolute tolerance	Sangwon Ha
	- Add absolute tolerance value of 0.001f to DFT validation test to to disregard cases of floating-point round off error, in line with other tests. Resolves: COMPMID-6945 Change-Id: Id92c22d31cf3045968ae813985c49713354ea955 Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11399 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-03-25	Adds Tests and reference implementation for scatter operator with 1D tensors.	Mohammed Suhail Munshi
	Resolves: [COMPMID-6890] Change-Id: Ie4a8db24fc6387afa9ddf42b3607e040cdf8df67 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11339 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-22	[ONCPUML-1451] Guard bf16 to bf16 tests with ↵	Renato Arantes
	ARM_COMPUTE_ENABLE_FIXED_FORMAT_KERNELS Change-Id: I6a01fe1e19a9d3e38908309d766fe7fc43775490 Signed-off-by: Renato Arantes <renato.arantes@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11338 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2024-03-22	Fix for nightly build failures for android	Mohammed Suhail Munshi
	- Remove unused variables. Change-Id: I92b3a03e03517bedb26e60ad04b914359acf42f3 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11337 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-03-21	Add skeleton for CLScatter op, reference and tests	Mohammed Suhail Munshi
	- Adds dataset for tests - Adds skeleton for function, operator, reference and tests Resolves: [COMPMID-6889] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I7e57e8b4577fef6aa7421e672894c249cad6c5fa Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11234 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-21	[ONCPUML-1451] Add matmul kernel to enable bf16 to bf16 operations via ↵	Renato Arantes
	PyTorch® autocast() function The full range of tests must be added with [MLINFSW-482] epic due to the lack of reordering kernels implemented in Acl. Co-Authored-By: David Mansell <David.Mansell@arm.com> Change-Id: I820d316295a1ec94fdc89c37e4144a268f914c36 Signed-off-by: Renato Arantes <renato.arantes@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11169 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-20	Make Cpu/Gpu/Ref scalar/vectoral S32 division consistent	Gunes Bayir
	- Neon(TM) implementation converts integers to float and performs the division because there is no vector integer division instructions. However, leftover loop still uses integer division, which makes results inconsistent depending on where we are in the tensor. - SVE path does it in integer domain. - OpenCL(TM) does it similar to Neon(TM) vector path. - Reference implementation does it in integer domain. These differences cause intermittent mismatches. This patch ensures all follow the same logic. On the other hand, the provided Neon(TM) implementation is faster than the Fp32 converted version. Resolves: COMPMID-6925 Change-Id: Ia12606d57f40a7d331b9b698f87fd4321496b275 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11316 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-20	Increase tolerance_num of Cpu RNNLayer tests	Gunes Bayir
	Instead of increasing the tolerance amount, we increase the number of elements we tolerate to 0.02 % of the whole tensor. This ensures we do not affect the tolerance for smaller tests. This amount is set according to the number of elements above the threshold. it was 1 over 512 elements, 1/512 ~ 0.02 %. Resolves: COMPMID-6932 Change-Id: I9d3ce29a3972aa8b9daea5288005a0a41a266328 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11321 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-03-19	Increase MatMul and DilatedConv test Q8 thresholds to 1	Gunes Bayir
	Tolerance for quantized tests is better to be 1 due to possible rounding differences between the Acl and reference implementations. Resolves: COMPMID-6929 Change-Id: I6f317631322b702e6a9579593befff65bbf46151 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11319 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-03-14	Fix validation in pool2d assembly wrapper	Pablo Marquez Tello
	* Validate output shape in CpuPool2dAssemblyWrapperKernel * Resolves ARMCL-625 Change-Id: I4fd91c1b15ecb17efc39fd3e82a92210e4f182b2 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11290 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-11	Prefer indirect Gemm vs. Direct convolution if supported	Gunes Bayir
	Indirect GEMM uses optimized assembly path while Direct Conv uses the fallback Acl kernel for convolution. In certain cases, where the input tensor is large and filter size is greater than 7 (e.g. 9x9 filters), heuristics fall back to Direct Conv algorithm where it could still prefer the assembly path if the data layout is NHWC. This is more important when SME2 kernels are present. Resolves: COMPMID-6900 Change-Id: Ia611c975eee0423615113fcaeaa8f9eef0421456 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11254 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-03-11	Set int8 test tolerance in FullyConnected to int8	Gunes Bayir
	Int8 data types should be compared with Int8 tolerances. Same holds for UInt8 as well. O/w the differences between reference and target can be larger than it is because of casting. Resolves: COMPMID-6930 Change-Id: I4940d821b7fecc21cf6b167e161dffceb764b909 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11269 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-02-20	Requantization cases for offset changes only	Mohammed Suhail Munshi
	Resolves: [COMPMID-6681] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I325b9d478dd1d04a45533bb7708cf76e98ee0cee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11058 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-15	Fix linker errors in validation suite for WoA	Pablo Marquez Tello
	* Cleaned up the ActivationLayerFixture and removed the context becuase it was causing compiler errors with clang-cl * The runtime context is an experimental feature that is only tested in the ActivationLayerFixture and not used anywhere else in the codebase * Resolves MLCE-1209 Change-Id: Id0ca71d60e78772dccbd02db407f87c94a087eb1 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11145 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-15	Fix validation suite on WoA	Pablo Marquez Tello
	* The missing parser option caused a segfault in the validation tests. * Resolves MLCE-1209 Change-Id: I59621241bb66f300c0c581741727f3abf0dbe43e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11139 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-02-14	[QTest] Use dynamic output quantization in Depthwise Conv tests	Omar Al Khatib
	Resolves: COMPMID-6483 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I512102f5e27743098168101b5e02382f4ad4a22a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11068 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-02-12	Disable some DirectConv2d tests in Dynamic Fusion	Gunes Bayir
	There is a potential implementation issue in DirectConv2d CKW kernel. It's been revealed when we moved from template writer to compute kernel writer. This patch disables the offending tests. It also disables tests for an unimplemented operator. Relates to: COMPMID-6715 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I40e6256cf377ebf8b1c0d0d0c4788de19ec410e4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11123 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-02-09	Add support for QSYMM8 in ClCastKernel	Pablo Marquez Tello
	* Resolves ARMCL-1123 Change-Id: I4f8432ba41fa50bf787fb068c3672ac06b858bdd Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11117 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-02-09	Remove CKW prototype and Template Writer	Gunes Bayir
	Gpu code in dynamic fusion is now written by stable CKW. We do not need CKW protoype and the older writer implementation, i.e. TemplateWriter. It also removes the need for the flag -DACL_INTERNAL_TEST_CKW_IN_DF to compile and test dynamic fusion operator. Resolves: COMPMID-6715 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I9f9453311e79d9be612bd4754240d832f98503e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11116 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-02-08	Fix the bug in GpuTanh operator in dynamic fusion	Gunes Bayir
	Tanh in dynamic fusion is a simple operator with no A and B coefficients, as its public interface implies. Tanh operator follows the TOSA specification. Customization of tanh calculation with a and b can be achieved via fusion as below: out = a * tanh(b in) --> x = b in y = tanh(x) out = a * y; Resolves: COMPMID-6873 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I818765192f631ae82c2094b0fc376fb87bae4fa4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11109 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-02-06	Disable FP16 tests compilation on Multi-Isa v8a	Mohammed Suhail Munshi
	Resolves: [COMPMID-6791] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Idae6ddb0e9655ec096f25917f0a44eb57aaef908 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11076 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2024-02-01	Use the stable CKW API in the GPU dynamic fusion backend	Gunes Bayir
	- Refactor all kernels to work with the CKW stable API - Add support for sub-tile in the op_load/op_store CKW operator - Fix mismatch in resize - Add comments in all kernels written with CKW to help developers understand the structure of the code - Add texture image support in depthwise convolution written with CKW - Add support for different block sizes in depthwise convolution - Remove the use of the dynamic fusion helper functions. - Add support for floor in the op_unary() of CKW Resolves: COMPMID-6708, COMPMID-6743, COMPMID-6530 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Change-Id: I8104ce4d04a3138a1aeb0b84940e1f1c89e76069 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10914 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-24	Fix tolerance issue in BF16 MatMul tests	Gunes Bayir
	BF16 kernels are not expected to have the same tolerance/accuracy standards as full float kernels. The reference implementation is a standard floating point implementation, thus resulting in small mismatches. We increase the tolerance of the MatMul BF16 tests, and add more tests to cover more shapes. Previously, the only tested bf16 kernel was a64_hybrid_fp32bf16fp32_mmla_4x24. With the inclusion of new shapes, heuristics also choose a64_hybrid_fp32bf16fp32_mmla_6x16 and stress this kernel as well, covering every implementation. Resolves: COMPMID-6654 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I15342606912013c123b94c7e0ea2e6bbb25680d7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11014 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-01-23	Make GpuWorkloadContext own all tensor info objects	Viet-Hoa Do
	* The tensor info objects created by calling create_tensor_info is now solely owned by the context object. The user only receives pointers to those objects. - Internally pointers to tensor info objects are used in various places. It's safer for dynamic fusion to manage these objects directly rather than relying on the users. - The validation test is updated to use the modified API. * Make various changes in dynamic fusion API to make it more friendly (e.g. making some of the objects moveable). Partially resolves: COMPMID-6707 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ifee70e53c05f8e7b72bf9ef123701ff291c5ee80 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10990 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-11	Fix test compilation error on GCC 13.2	Jakub Sujak
	Remove a std::move flagged by -Wpessimizing-move Resolves: COMPMID-6777 Change-Id: Ie082dc2eab0cb11e9a29f6f6fc98866306fd2cfa Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10957 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-01-04	Implement dynamic quantization for GEMMLowp tests	SiCong Li
	This patch calculates the output quantization info based on the inputs' quantization information. The previous approach was using the same quantization information for input, weights and output. Remove QSYMM8_PER_CHANNEL path from the fixture as there are no related tests Remove repeated shapes from the dataset now that we get rid of the quantization info from the dataset. Combine signed and unsigned SmallGEMMLowpFusedBatchedMatMulDataset into one as they become identical Resolves COMPMID-6481, COMPMID-6634 Change-Id: I9f5a20f4bb45c3e5adab388564135ae8a5c0a9ea Signed-off-by: SiCong Li <sicong.li@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10680 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-22	Add Mali™-G720 and Mali™-G620 as GpuTargets	Gunes Bayir
	This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues. Resolves: COMPMID-6766 Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-14	Fix Run Example in Validate Tests	Mohammed Suhail Munshi
	Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Icee8b38db1f219d66ac22a6e0980f4325fd21fbd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10868 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-08	Adjust NEReduceMean test tolerance	SiCong Li
	Resolves COMPMID-6728 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ic0682550a09db9aa420057a90ee65386e16e6034 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10853 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05	Optimize CpuSoftmaxKernel for axis=0	Gunes Bayir
	Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-23	Remove the legacy core library	Jakub Sujak
	Stop building and linking to the legacy libarm_compute_core artifact. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users should link only to the main libarm_compute library for core functionality. Resolves: COMPMID-6329 Change-Id: Ife9d2c25d275e7c676deb09632ae461f697efde9 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10728 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14	Update comments to suppress doxygen warnings.	Anitha Raj
	Resolved COMPMID-6367 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I96f244811a81a4e278f0c5e47d5014229cad3a25 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10727 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-08	Optimize CpuGemmConv2d start-up time	SiCong Li
	When weight has no holes, we can replace CpuWeightsReshapeKernel with: - Collapse by reinterpreting weight's 3 spatial dimensions - Perform CpuTranspose For more details see the documentation in src/cpu/operators/CpuGemmConv2d.cpp This is one optimization since the CpuTranspose is better performing than CpuWeightsReshapeKernel A second optimization is to fuse this transpose with other weight transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch) However this second optimization depends on how the underlying gemm methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly path: CpuGemmAssemblyDispatch) chooses to fuse the transpose. Therefore, this patch moves the transpose down from CpuGemmConv2d, to the individual gemm operators where the fusion decision needs to be made, by passing an extra "transpose_b" flag to CpuGemm New transpose_b flag in different scopes (they are all the same, but with different names because pretranspose_b has a different meaning in GemmAssemblyDispatch): GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b New auxilliary tensors holding the transposed b result: - CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB - CpuGemm fallback path: CpuGemm::PreTransposedRHS Note that this patch does not yet have the second optimization (COMPMID-6595), but it prepares for it. Relates to COMPMID-6595 Resolves COMPMID-6499 Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-06	Fix Elementwise Division Dynamic Shape tests	Anitha Raj
	- Enable use_dynamic_shape ArithmeticDivisionDynamicShapeValidationFixture Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I42ddf5b604d728eda91fa45b239abf8caf2cda0f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10586 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>