ComputeLibrary.git -

Age	Commit message (Collapse)	Author
2023-12-08	Fix unit tests failing in CL/UNIT/TensorAllocator	Gunes Bayir
	The function pointer for clImportMemoryARM should be loaded in a portable way as recommended by Khronos® as outlined here: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#getting-opencl-api-extension-function-pointers using clGetExtensionFunctionAddressForPlatform() call. All extensions should ideally be loaded using the above mentioned function. Resolves: COMPMID-6732 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I482b6bde721267d5e8c08301e5780d28a9c5ba85 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10852 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-07	Optimize CPU depth-to-space	Viet-Hoa Do
	Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-06	Revert "thread_local _custom_scheduler"	Pablo Marquez Tello
	This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab. Resolves COMPMID-6735 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05	Optimize CpuSoftmaxKernel for axis=0	Gunes Bayir
	Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-28	Changes to enable FP16 in armv8a multi_isa	Pablo Marquez Tello
	* This is the initial patch to start working on enabling fp16 in all multi_isa builds. More changes are required in the way we register the kernels using the macro REGISTER_FP16_NEON. * In this patch we add the capability to build the fp16 files in listed in filelist.json with the correct arch option to enable FP16 * This patch is required towards building an universal multi_isa binary where fp16 is enable. * Enable REGISTER_FP16_NEON macro for all builds by removing __ARM_FEATURE_FP16_VECTOR_ARITHMETIC guard from the macro definition. The macro has to be used across all types of builds. Change-Id: I99f4c273f6ee04cad3c097e5e374200f48568fa9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10682 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27	BatchNorm changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Moved NCHW kernels fp16 and fp32 to their corresponding files src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp * Changes in filelist.json to include the new fp16 and fp32 files * Moved the template batch_normalization_nchw to impl.h as we need to instantiate it from fp16.cpp and fp32.cpp * Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that prevented the FP16 kernel execution. * Partially resolves MLCE-1102 Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27	CpuMul changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Moved fp16 and fp32 to their corresponding files src/cpu/kernels/mul/generic/neon/fp16.cpp and src/cpu/kernels/mul/generic/neon/fp32.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Partially resolves MLCE-1102 Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-11-24	thread_local _custom_scheduler	David Svantesson
	Resolves ONCPUML-1331 This patch adds an option to make _custom_scheduler thread_local to support usage of multiple schedulers handled outside of ACL. It also adds num_threads() function to Scheduler which reverts to querying CPUInfo if no scheduler has been set. Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-16	NormalizationLayer changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Moved the template arm_compute::normalize_float to impl.h because we need to instantiate it from both NENormalizationLayerKernel.cpp and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that the fp16 kernels can be compiled in for multi_isa builds * Moved fp32 kernels to the corresponding file src/cpu/kernels/norm_layer/generic/neon/fp32.cpp * Partially resolves MLCE-1102 Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-15	Fix various coverity issues	SiCong Li
	Resolves COMPMID-6677 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I99bf2385f6edc0836faacb31f5c66ed4fb051e40 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10729 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-15	Fix device issue with CL softmax	Viet-Hoa Do
	* Performing the second pass in reverse order doesn't seem to work reliably in some specific devices. This patch introduces another approach to workaround the device issue. Resolves: COMPMID-6669 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I591f05ff06f8439ebe4d32093441ae871a292f4c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10730 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14	Update comments to suppress doxygen warnings.	Anitha Raj
	Resolved COMPMID-6367 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I96f244811a81a4e278f0c5e47d5014229cad3a25 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10727 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-10	Fix CpuGemmConv2d int8 segfault	SiCong Li
	Bypass importation of memory of the original weights into the reinterpreted_weights auxiliary tensor if other weight transformation path is selected (which would've freed the original weights and its tensor info) Resolves COMPMID-6635 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib8a345c3ac542bc3745d6a67db822b55df37e827 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10698 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-09	Remove duplicate definitions of BF16 fixed format kernels.	David Mansell
	Change-Id: Ie68b0a19040cc6b5bf47fca406989f39aa8d7b81 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10687 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-09	Pooling changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Changes in filelist.json moving fp16 file from common to fp16 attribute * Changes in kernel CpuPool2dAssemblyWrapperKernel, replaced __ARM_FEATURE_FP16_VECTOR_ARITHMETIC by ENABLE_FP16_KERNELS to make sure the fp16 kernels are compiled in for multi_isa=1 * Partially resolves MLCE-1102 Change-Id: I327154ec5b1ddfb9f54d9096f00c35b3e05c678a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10662 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-09	DepthwiseConvolution changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Changes in filelist.json moving fp16 file from common to fp16 attribute * Removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in CpuDepthwiseConv2dAssemblyWrapperKernel to always create the assembly kernel * Partially resolves MLCE-1102 Change-Id: I2f88d5e54a94042cfb3cb4ea0386338a7c444866 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10626 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-08	Optimize CpuGemmConv2d start-up time	SiCong Li
	When weight has no holes, we can replace CpuWeightsReshapeKernel with: - Collapse by reinterpreting weight's 3 spatial dimensions - Perform CpuTranspose For more details see the documentation in src/cpu/operators/CpuGemmConv2d.cpp This is one optimization since the CpuTranspose is better performing than CpuWeightsReshapeKernel A second optimization is to fuse this transpose with other weight transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch) However this second optimization depends on how the underlying gemm methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly path: CpuGemmAssemblyDispatch) chooses to fuse the transpose. Therefore, this patch moves the transpose down from CpuGemmConv2d, to the individual gemm operators where the fusion decision needs to be made, by passing an extra "transpose_b" flag to CpuGemm New transpose_b flag in different scopes (they are all the same, but with different names because pretranspose_b has a different meaning in GemmAssemblyDispatch): GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b New auxilliary tensors holding the transposed b result: - CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB - CpuGemm fallback path: CpuGemm::PreTransposedRHS Note that this patch does not yet have the second optimization (COMPMID-6595), but it prepares for it. Relates to COMPMID-6595 Resolves COMPMID-6499 Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-07	Update heuristic for MatMul Native U8	Gian Marco Iodice
	Resolves COMPMID-6479 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I13aa0ef944a75ba8b5e4df183d52df57b9aba90f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10659 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-01	Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82	Viet-Hoa Do
	Resolves: COMPMID-6599 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Id91185871f0dc30c08c7c38379acd5a3c1056473 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10575 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-01	Fix compilation error with clang and multi-isa	Viet-Hoa Do
	* Fix SVE header being included in non-SVE file. Resolves: COMPMID-6613 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ic7f662a239b761b83e67e11b6cc03f7d5f5cd051 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10573 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-31	[GPU] Update Reverse layer to allow negative axis and reversed axis order	Adnan AlSinan
	- Adds option to use negative axis and inverted axis. - Adds validation tests for the above. Resolves COMPMID-6459 Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31	Extend CKW MatMul with nt_t	Adnan AlSinan
	- Add the kernel variant: (nt_t) to GpuCKWMatMul. - Extend CKW MatMul validation test with nt_t. - Fixes a bug in CKW where z-dim = 1. Resolves: COMPMID-6435 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I4c5e8791e55f21ffff3c11eca7802c51a4259977 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10525 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31	Fix SVE kernel using SVE2 instruction	Viet-Hoa Do
	Resolves: COMPMID-6493 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I038d91ba266e1e8bf124336bcd272ec77e92038c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10490 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-31	Optimize CL softmax	Viet-Hoa Do
	* The new softmax implementation consists of only a single kernel. - There are 2 versions of softmax, one for the x dimension and one for any other dimensions. - Softmax kernel handles both native and quantized data type. Resolves: COMPMID-6447 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-30	DirectConv and Im2Col changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-26	Add check to disable dynamic bias with quantized datatypes in Conv2D layer	Mohammed Suhail Munshi
	Resolves: COMPMID-6397 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id4404f75dae03fd529db1adac5ab9ca48d08ec46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-20	FuseBatchNorm changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: Ie652203876a0ac12b025e96d20990b6efb21e772 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10477 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-17	arm_gemm: Add SME2 FP16 GEMV using FP16->FP32 dot product.	David Mansell
	Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: If02f7809f9b6e84979121698c5e7a62cbb41e2c3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10487 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-17	Revert "arm_gemm: Add SME2 FP16 GEMV."	David Mansell
	This reverts commit aeced744b854758768243833bcdf999c0c3c1a5b. Reason for revert: Incorrect SME architecture checks. Change-Id: I23fe78178041a544a8791a4655bf6fe4aa375e38 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10501 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-13	Connect MatMul MMUL kernels to ClMatMul operator	Gunes Bayir
	Resolves: COMPMID-6478 Change-Id: I5bc220c3bd00a316776fe14454438cc0dc9049b3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10469 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-13	Fix build error in CpuScale	Pablo Marquez Tello
	* Build error when using data_layout_support=nhwc * Some kernels need to be guarded by ENABLE_NCHW_KERNELS Change-Id: I0414084b458360c7e8d2842f4734c39aad80852e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10476 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12	Scale changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Partially resolves MLCE-1102 Change-Id: If050608e56d75649b8d07757604ae10d6fc4269b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10461 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12	arm_gemm: Add SME2 FP16 GEMV.	David Mansell
	Change-Id: I1f73819c25c66e4d13198e9c79755808d92b343d Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10466 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-12	Remove padding from CL comparison operator	Viet-Hoa Do
	* Add support for processing left-over vector to comparison kernel. * Combine native and quantized versions of CL comparison. Resolves: COMPMID-6424 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I31d43bdf0eab999cee6fa8144b5d8e921a1093e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10467 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-11	Optimize CL reduction operation	Viet-Hoa Do
	* Batch dimension is added to reduction operation. - All the dimensions higher than the batch dimension are collapsed so that the input and output tensors are always 3-4D. - CL kernel is called once instead of being repeatedly called to process each sliding window. Resolves: COMPMID-6443 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	arm_gemm: fix 2D threading mode for SME2	David Mansell
	"2D" threading mode was not setting the result pointer correctly for SME2 kernels with K blocking - for non-final blocks the result pointer should be NULL so that the intermediate results get written in the accumulator buffer by the kernel. Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: Idefa538e190a086e1e44a91998ab7e949e3989e4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10342 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	Fix build error	Pablo Marquez Tello
	* Build error when using data_layout_support=nhwc * Some kernels need to be guarded by ENABLE_NCHW_KERNELS Change-Id: I9fb6cf0e204531f81b0dff3572a1740ba94cde0e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10460 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	Fix NEReorderKernel validation	David Svantesson
	Partially resolves: ONCPUML-1251 Reorders of block size 8 are only available when SVE is enabled. This fixes an issue with these reorders being checked even when SVE is not enabled. Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I287763a547d2ab303c3f8ce10164178b680e551e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10464 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	Port MatMul to Dynamic Fusion + CKW boilerplate code	Adnan AlSinan
	- Port Matmaul to to Dynamic Fusion. - Prepare a CKW boilerplate code. - Implement the following classes: - MatMulAttributes - GPUMatMulSettings - GpuMatMul - ClComponentMatMul - GpuCkwMatMul Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I5a7c183b293973e8a4233b554b2affe0bb28f44d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10453 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	Optimize NEStackLayer	Gunes Bayir
	Optimize the stack operation in Cpu by leveraging block memcpy. Resolves: COMPMID-6498 Change-Id: I49d79d179f0375a73d654edd59fb33072112569b Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10451 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	Fix compilation error caused by ambiguous std::abs call	Gunes Bayir
	Resolves: COMPMID-6583 Change-Id: Icb760b57df1e4573e5009e41ee13ada869acc687 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10457 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-10	CpuSubKernel changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* FP16 kernels must be instantiated in fp16.cpp. * Partially resolves MLCE-1102 Change-Id: I497fe0ba6e84493a5072c3e80bbba7ecd5de8095 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10448 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-09	Change heuristics for FP16 Deconv	Sangwon Ha
	- For FP16, disable direct Deconv method when oiutput channel count is greater than 32. Resolves COMPMID-6311 Change-Id: I14d9dbf1a1b95736ccd09488d633df4775a01dcb Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10446 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-09	Pool2d changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* FP16 kernels must be moved from src/cpu/kernels/pool2d/neon/nchw/all.cpp to src/cpu/kernels/pool2d/neon/fp16.cpp. * In src/cpu/kernels/pool2d/neon/list.h when we declare the kernels we need to remove defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) so that in std::vector<CpuPool2dKernel::PoolingKernel> available_kernels * Partially resolves MLCE-1102 Change-Id: I000380f8eccca17e6219c4f3453980d67a2c9dd8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10444 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-05	Optimize CLTranspose operator	Jakub Sujak
	* Transpose higher dimensional tensors (>2D) by collapsing higher dimensions into the third dimension thus avoiding multiple dispatches of the CL kernel * Maximize tile size without register spilling Resolves: COMPMID-6448 Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-04	Port DepthwiseConv2d operator to Ckw	ramy.elgammal@arm.com
	- Only support 1x1 blocks, i.e. n0=1, m0=1. - Dilation not supported yet. Resolves: COMPMID-6258 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-04	NEDeconvolutionLayer validation fix	Pablo Marquez Tello
	* Added a new test to make sure we support the following configuration: NCHW InputInfo=Shape=2,2 WeightsInfo=Shape=3,3 OutputInfo=Shape=4,4, PadStrideInfo=1,1;0,0,0,0' * Fixed the validate() method to allow this configuration * Resolves MLCE-1120 Change-Id: I6874ad57bb81384185984741b983bf5e19ba150c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10417 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-03	Fix nightly NEON Reverse reference failure	Adnan AlSinan
	- Fix the reference axis vector to be the right size. - Update typos in the error messages. Resolves COMPMID-6574 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I9572365b8173b92d0fffd557e4db261b2969109c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10423 Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2023-10-02	Optimize CL and Neon Winograd tests	Gunes Bayir
	Several test optimizations have been introduced into Winograd tests for Gpu and Cpu backends. The testing strategy has been detailed as a comment header in the test design files. In summary - Very large shapes in the nightly are made smaller - If the underlying kernel is the same for different data types, we only need to stress some key aspects of the kernels (e.g. read/write lengths in case of fp32/fp16). - In case the underlying kernel is the same (OpenCL), Fp16 is tested on a subset of the shapes - In Cpu, there is no need to test every combination for both NCHW and NHWC as we just permute the inputs and use NHWC kernels anyways - All activations does not need to be tested for each and every shape Resolves: COMPMID-6464 Change-Id: Ie25fded85c65b9c7386dc21b23f9b695b1e77b07 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10393 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-29	Implement Quantized Matmul T/T and T/Nt kernels using MMUL extension	Gunes Bayir
	Resolves: COMPMID-6476, COMPMID-6477 Change-Id: Ied37c269d5a108ff72f70e3ad932cf372bda5562 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10346 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>