aboutsummaryrefslogtreecommitdiff
path: root/src/runtime
AgeCommit message (Collapse)Author
2024-03-25Adds Tests and reference implementation for scatter operator with 1D tensors.Mohammed Suhail Munshi
Resolves: [COMPMID-6890] Change-Id: Ie4a8db24fc6387afa9ddf42b3607e040cdf8df67 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11339 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-03-21Add skeleton for CLScatter op, reference and testsMohammed Suhail Munshi
- Adds dataset for tests - Adds skeleton for function, operator, reference and tests Resolves: [COMPMID-6889] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I7e57e8b4577fef6aa7421e672894c249cad6c5fa Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11234 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-10Fix compilation error on GCC 13.2Jakub Sujak
Suppress a false positive compiler warning caused by a bug in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104165 This issue is known to be reproducible in some versions of GCC 11, 12 and 13. Remove a redundant std::move flagged by -Werror=redundant-move Resolves: COMPMID-6777 Change-Id: I782e87b5e3df4c09195e67a37f49d122dc918224 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10950 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-22Add Mali™-G720 and Mali™-G620 as GpuTargetsGunes Bayir
This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues. Resolves: COMPMID-6766 Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-07Optimize CPU depth-to-spaceViet-Hoa Do
Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-06Revert "thread_local _custom_scheduler"Pablo Marquez Tello
This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab. Resolves COMPMID-6735 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05Optimize CpuSoftmaxKernel for axis=0Gunes Bayir
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-24thread_local _custom_schedulerDavid Svantesson
Resolves ONCPUML-1331 This patch adds an option to make _custom_scheduler thread_local to support usage of multiple schedulers handled outside of ACL. It also adds num_threads() function to Scheduler which reverts to querying CPUInfo if no scheduler has been set. Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-15Fix various coverity issuesSiCong Li
Resolves COMPMID-6677 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I99bf2385f6edc0836faacb31f5c66ed4fb051e40 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10729 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-07Update heuristic for MatMul Native U8Gian Marco Iodice
Resolves COMPMID-6479 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I13aa0ef944a75ba8b5e4df183d52df57b9aba90f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10659 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-31[GPU] Update Reverse layer to allow negative axis and reversed axis orderAdnan AlSinan
- Adds option to use negative axis and inverted axis. - Adds validation tests for the above. Resolves COMPMID-6459 Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-26Add check to disable dynamic bias with quantized datatypes in Conv2D layerMohammed Suhail Munshi
Resolves: COMPMID-6397 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id4404f75dae03fd529db1adac5ab9ca48d08ec46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-13Connect MatMul MMUL kernels to ClMatMul operatorGunes Bayir
Resolves: COMPMID-6478 Change-Id: I5bc220c3bd00a316776fe14454438cc0dc9049b3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10469 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-10Optimize NEStackLayerGunes Bayir
Optimize the stack operation in Cpu by leveraging block memcpy. Resolves: COMPMID-6498 Change-Id: I49d79d179f0375a73d654edd59fb33072112569b Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10451 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-09Change heuristics for FP16 DeconvSangwon Ha
- For FP16, disable direct Deconv method when oiutput channel count is greater than 32. Resolves COMPMID-6311 Change-Id: I14d9dbf1a1b95736ccd09488d633df4775a01dcb Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10446 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-04NEDeconvolutionLayer validation fixPablo Marquez Tello
* Added a new test to make sure we support the following configuration: NCHW InputInfo=Shape=2,2 WeightsInfo=Shape=3,3 OutputInfo=Shape=4,4, PadStrideInfo=1,1;0,0,0,0' * Fixed the validate() method to allow this configuration * Resolves MLCE-1120 Change-Id: I6874ad57bb81384185984741b983bf5e19ba150c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10417 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-09-27Implement tflite compliant reverse for CPUAdnan AlSinan
- Add support for negative axis values. - Add option to use opposite ACL convention for dimension addressing. - Add validation tests for the mentioned additions. Resolves COMPMID-6497 Change-Id: I9174b201c3adc070766cc6cffcbe4ec1fe5ec1c3 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10335 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-27Enable job-chaining with incremental job_chaining_size.Anitha Raj
Resolves COMPMID-6458 Change-Id: I1068da3dee6b6f58e4179f5a92521a6d6457e6c4 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10380 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Remove legacy PostOps codeJakub Sujak
PostOps was the experimental interface for Dynamic Fusion. It is now replaced by the new Dynamic Fusion interface with code generation using the Compute Kernel Writer. Resolves: COMPMID-6190 Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-15Fix out-of-scope CLBufferMemoryRegion's buffer still in queue issueSiCong Li
When a CLBufferMemoryRegion is freed, it also frees its cl::Buffer object. At this point we need to flush the queue to ensure all prior commands that may use this buffer are completed before the buffer's deallocation. Previously a CommandQueue object is owned as a member inside CLBufferMemoryRegion. Whenever CLBufferMemoryRegion is freed it causes the queue to be released, which implicitly flushes the queue. Now we need to explicitly flush the queue, without the excessive releasing of the queue Resolves COMPMID-6492 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I799507bcff8526d1381cde53d7c6298684c6d3ee Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10126 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Add support for S64 output in NEArgMinMaxLayerPablo Marquez Tello
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32 * We need to call NECast to convert from S32 to S64 * Resolves MLCE-1089 Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix ReduceMean validate issueViet-Hoa Do
Resolves: COMPMID-6406 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ic638616f4cb228673928815b759caee3d094780d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10043 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-25Fix problem with exception handling in CPPSchedulerMatthew Bentham
If an exception was thrown in the main thread, the child threads were not being synchronised, leading to undefined behaviour (and probably the program crashing instead of correctly propagating the exception). Add support to the build system for enabling ThreadSanitizer. This needs a build system option rather than simply passing extra_cxx/link_flags because the sanitizer options are incompatible with -Wl,-undefined,error. Add a unit test for throwing an exception within a CPPScheduler workload. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: I7638272a5d43a24a861f3e6d63f3ee7b099483b5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/538048 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9957 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-21Enable S64 output in CLArgMinMaxPablo Marquez Tello
Resolves MLCE-1089 Change-Id: I8b385ef8a00ec5de60299bc7a359766ba5417e68 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9918 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Fix excessive calls to clReleaseCommandQueueSiCong Li
Use the queue object from the singleton Scheduler to avoid repeatedly freeing cl::CommandQueue object on the stack. Resolves COMPMID-6021 Change-Id: I0baf5891a7974cf4c7efad1b13cc5f28e49a2745 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9896 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-11Add Bias to MatMul Kernels and add support for use in Fully Connected LayerMohammed Suhail Munshi
Resolves: [COMPMID-6316] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I08e6bac9e6b46b76978da0dc6a48ccfe3dde5086 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9833 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-10Do not include headers necessary for logging when logging is disabledMatthew Bentham
Speeds up compilation by 30% for some files when logging is disabled. Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Change-Id: Ia479bd50a80616a34e33ead13db8558f8dbaa1aa Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/534480 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9880 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-07Enable transpose convolution with non-square kernelsViet-Hoa Do
Resolves: COMPMID-6319 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I49a17ff973efc88b7ce0334c47ecf076c03f4cc3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9829 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-05Rewrote CLArgMinMax for axis 0Pablo Marquez Tello
* Simpler implementation without stages for axis 0 * Removed considerable amount of code. Resolves COMPMID-6271 Change-Id: Ie8bcb2f0b55f87472f44b38872a23a922619a211 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9849 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-26Use MatMul in fully connected layer with dynamic weights when supportedMohammed Suhail Munshi
- Use MatMul kernels in FC layer when using dynamic weights without broadcasting or bias. - Fix minor typo in IClMatMulNativeKernelConfig.h Partially Resolves : [COMPMID-6193] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id494062b5b4f4e75ff9714c202dde941955afa52 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9797 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-16Add Fused Activation to OpenCL MatMulMohammed Suhail Munshi
- Added fused activation to MatMul function interface - Added fused activation to CL backend - Includes tests for supported Activation Functions in MatMul Resolves: [COMPMID-6192] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie103212b600b60699eaf6a6394d609e6e1f5aba6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/522465 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9714 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up arm_compute/core/Types.h a bitMatthew Bentham
Split some of the larger types with inlined code into their own header files, so that the implementation of them needn't be included everywhere. Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-09Reorder destructor in srcDavid Svantesson
Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: Iaed0933d665bd98829be49b9df11653d4d74081c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9746 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-05-12Fix performance regression in FP16 DeconvolutionJakub Sujak
The previous heuristic for selecting the Deconvolution method with FP32 input data introduced a performance regression for FP16. A simple fix ensures the previous heuristic applies to FP32 types only. Resolves: COMPMID-6027 Change-Id: I77ca6c9c72534057a3967db58924a972b0efb09f Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9616 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-05-11Remove check for bias in CPU Depthwise ConvolutionJakub Sujak
The Depthwise convolution operation is not required to have a bias, hence the check may fail unexpectedly. Resolves: COMPMID-6250 Change-Id: I2844ffde6139f79ade118d756c930318f16fbe50 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9615 Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Sang Won Ha Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-10Remove inclusion of NEReorderKernel header from NEReorderLayerRamy Elgammal
Resolves: COMPMID-6235 Change-Id: I7a094a23244286090415ee2788632cfa7bd6c037 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9608 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-05-10Re-enable dyanmic weights in Neon™ depthwise convolutionRamy Elgammal
- Call Neon™ depthwise convolution validation inside in its configure() method. Resolves: COMPMID-6188 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib2ae4d995ff2bbc92ce4496d4ab93cf09113e3e9 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9594 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-05Disable dynamic weights in unsupported operatorsViet-Hoa Do
Resolves: COMPMID-6185 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icfd9d177083ecdf41dc13e5b2ae982ff67492f8a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9577 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-05Make NECast::validate take args by const pointerMatthew Bentham
Change-Id: I5d343e959942cb2ce48442d95d7c62aecd6a34d0 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9573 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-05-04Implement OpenCL MatMul heuristic for Arm® Mali™-G710Gian Marco Iodice
- Add heuristic for f32/f16 and int8 quantized data types - Include MatMul configuration selection in the CLMatMul operator Resolves COMPMID-5950, COMPMID-5957, COMPMID-5959, COMPMID-5925, COMPMID-5926, COMPMID-5927, COMPMID-5928 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Ic222148da0337b88d4d8c960e3b6ac31003d8bcb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9564 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Guards to make NEReorder aarch64 onlyDavid Svantesson
Resolves COMPMID-6151 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I0e8c957f3460633c32ef57be0cdc44a53b8c3e88 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9553 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-02Add fp16 GeMM heuristic for Arm® Mali™-G710Gian Marco Iodice
- Performance improvements on various networks between 5-20% Resolves COMPMID-6030 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Idcf7de57e6f5a94a6a94ec78229dd53c24de44f4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/514481 Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9524 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-05-01Add Reorder to changelogDavid Svantesson
Partially resolves ONCPUML-1232 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I258d03524c50dd24975b473aede061f80bf9d91b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9534 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Reorder addedDavid Svantesson
Adds Reorder kernel exposing blocking reorders from arm_gemm Resolves ONCPUML-1232 Change-Id: I42bf4166311fe1771565134d3ed7039fc8e30230 Signed-off-by: David Svantesson <david.svantesson@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9500 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Fix OMPScheduler run_workloads single thread issueSiCong Li
Resolves COMPMID-6032 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Icca60deac7308173fc3a8282af91434b4d1c0b06 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9520 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Change fp16 GeMM heuristic for Arm® Mali™-G77Gian Marco Iodice
- Replace existing heuristic with look-up tables - Expected performance improvement is between 5-15% on various models Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Ie26ddf66895ede131aa06fde7b200ef94d2dd467 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9472 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-26Integrate multi-threaded pretranspose_B_arraySiCong Li
This is required for the case where rhs (B) is dynamic and needs to be pretransposed in every run. In a multi-threaded setting, this means the previously single-threaded pretranspose_B_array would become the bottleneck Resolves COMPMID-5896 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Id508c46992188a0f76a505152931d4955d04c16d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>