aboutsummaryrefslogtreecommitdiff
path: root/docs/user_guide/release_version_and_change_log.dox
AgeCommit message (Collapse)Author
2024-03-15Remove documentation reference to v24.04v24.02.1branches/arm_compute_24_02_1Felix Thomasmathibalan
Change-Id: I5996b0ea27579266543b53259bbbcd5b852d95f9 Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11304 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Jakub Sujak <jakub.sujak@arm.com>
2024-03-15Update documentation for 24.02.1 releaseFelix Thomasmathibalan
Co-authored-by: Michael Kozlov <michael.kozlov@arm.com> Signed-off-by: Michael Kozlov <michael.kozlov@arm.com> Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: I40b68789fd363e0a15a0939550df8b067920ad6e
2024-02-12Update documentation for 24.02 releasev24.02branches/arm_compute_24_02Felix Thomasmathibalan
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: I12511d9fb799a11fc9a889aeb0ad1739b9d1954d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11121 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Gunes Bayir <gunes.bayir@arm.com>
2024-02-07Parallelize CPU depthwise over batch if only 1 rowJonathan Deakin
This patch also fixes a bug where the split dimension was wrong in CpuDepthwiseConv2dAssemblyDispatch::run. It was set to DimY, which is cols, but it should have been DimZ. This was rarely an issue in practice because typically the number of cols are greater than the number of threads anyway. Relates to: ONCPUML-1443 Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: Ifed2fce22ddeb7cd77e6a6ae1083694427f91e04 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11083 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-01-10Use look up table for fp16 activationMohammed Suhail Munshi
- Enables FP16 lut for logistic activation - Adds LUTManager to re-use lut where appropriate. Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-22Add Mali™-G720 and Mali™-G620 as GpuTargetsGunes Bayir
This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues. Resolves: COMPMID-6766 Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-07Optimize CPU depth-to-spaceViet-Hoa Do
Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05Optimize CpuSoftmaxKernel for axis=0Gunes Bayir
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-28Changes to enable FP16 in armv8a multi_isaPablo Marquez Tello
* This is the initial patch to start working on enabling fp16 in all multi_isa builds. More changes are required in the way we register the kernels using the macro REGISTER_FP16_NEON. * In this patch we add the capability to build the fp16 files in listed in filelist.json with the correct arch option to enable FP16 * This patch is required towards building an universal multi_isa binary where fp16 is enable. * Enable REGISTER_FP16_NEON macro for all builds by removing __ARM_FEATURE_FP16_VECTOR_ARITHMETIC guard from the macro definition. The macro has to be used across all types of builds. Change-Id: I99f4c273f6ee04cad3c097e5e374200f48568fa9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10682 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-23Remove the legacy core libraryJakub Sujak
Stop building and linking to the legacy libarm_compute_core artifact. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users should link only to the main libarm_compute library for core functionality. Resolves: COMPMID-6329 Change-Id: Ife9d2c25d275e7c676deb09632ae461f697efde9 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10728 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14Update Release notes for 23.11Anitha Raj
Resolves COMPMID-6369 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I997a48c4e8efb67fbe53efd6fc498df4e6b41eea Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10726 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-08Optimize CpuGemmConv2d start-up timeSiCong Li
When weight has no holes, we can replace CpuWeightsReshapeKernel with: - Collapse by reinterpreting weight's 3 spatial dimensions - Perform CpuTranspose For more details see the documentation in src/cpu/operators/CpuGemmConv2d.cpp This is one optimization since the CpuTranspose is better performing than CpuWeightsReshapeKernel A second optimization is to fuse this transpose with other weight transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch) However this second optimization depends on how the underlying gemm methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly path: CpuGemmAssemblyDispatch) chooses to fuse the transpose. Therefore, this patch moves the transpose down from CpuGemmConv2d, to the individual gemm operators where the fusion decision needs to be made, by passing an extra "transpose_b" flag to CpuGemm New transpose_b flag in different scopes (they are all the same, but with different names because pretranspose_b has a different meaning in GemmAssemblyDispatch): GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b New auxilliary tensors holding the transposed b result: - CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB - CpuGemm fallback path: CpuGemm::PreTransposedRHS Note that this patch does not yet have the second optimization (COMPMID-6595), but it prepares for it. Relates to COMPMID-6595 Resolves COMPMID-6499 Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-01Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82Viet-Hoa Do
Resolves: COMPMID-6599 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Id91185871f0dc30c08c7c38379acd5a3c1056473 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10575 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-10-31[GPU] Update Reverse layer to allow negative axis and reversed axis orderAdnan AlSinan
- Adds option to use negative axis and inverted axis. - Adds validation tests for the above. Resolves COMPMID-6459 Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Extend CKW MatMul with nt_tAdnan AlSinan
- Add the kernel variant: (nt_t) to GpuCKWMatMul. - Extend CKW MatMul validation test with nt_t. - Fixes a bug in CKW where z-dim = 1. Resolves: COMPMID-6435 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I4c5e8791e55f21ffff3c11eca7802c51a4259977 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10525 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31Optimize CL softmaxViet-Hoa Do
* The new softmax implementation consists of only a single kernel. - There are 2 versions of softmax, one for the x dimension and one for any other dimensions. - Softmax kernel handles both native and quantized data type. Resolves: COMPMID-6447 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-11Optimize CL reduction operationViet-Hoa Do
* Batch dimension is added to reduction operation. - All the dimensions higher than the batch dimension are collapsed so that the input and output tensors are always 3-4D. - CL kernel is called once instead of being repeatedly called to process each sliding window. Resolves: COMPMID-6443 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-10Optimize NEStackLayerGunes Bayir
Optimize the stack operation in Cpu by leveraging block memcpy. Resolves: COMPMID-6498 Change-Id: I49d79d179f0375a73d654edd59fb33072112569b Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10451 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-05Optimize CLTranspose operatorJakub Sujak
* Transpose higher dimensional tensors (>2D) by collapsing higher dimensions into the third dimension thus avoiding multiple dispatches of the CL kernel * Maximize tile size without register spilling Resolves: COMPMID-6448 Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-04Port DepthwiseConv2d operator to Ckwramy.elgammal@arm.com
- Only support 1x1 blocks, i.e. n0=1, m0=1. - Dilation not supported yet. Resolves: COMPMID-6258 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-27Implement tflite compliant reverse for CPUAdnan AlSinan
- Add support for negative axis values. - Add option to use opposite ACL convention for dimension addressing. - Add validation tests for the mentioned additions. Resolves COMPMID-6497 Change-Id: I9174b201c3adc070766cc6cffcbe4ec1fe5ec1c3 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10335 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-18Add CL command buffer classViet-Hoa Do
* Two implementations of the command buffer are added: - CLMutableCommandBuffer uses mutable dispatch command buffer extension. - CLCompatCommandBuffer is the compatibility class for platform without the CL extension. Resolves: COMPMID-6454 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I15b370a50168ca940bd8fb2b5fae26230da3f472 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-09-15Remove deprecated support for BF16 in CpuCastAdnan AlSinan
Resolves : [COMPMID-6212] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-09-14Add skeleton of ClMatMulLowpNativeMMULKernelGunes Bayir
The skeleton code consists of modifications - to build the library with the quantized matmul kernel - refactoring of some common utilities - empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t]) - some validation tests and skeleton for functional tests Resolves: COMPMID-6473 Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-04Remove legacy PostOps codeJakub Sujak
PostOps was the experimental interface for Dynamic Fusion. It is now replaced by the new Dynamic Fusion interface with code generation using the Compute Kernel Writer. Resolves: COMPMID-6190 Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-31Port ClTemplatePool2d to ckwAdnan AlSinan
- Fixes a bug when using FP16 constant in some cases. - Adds op_write_raw_code to handle some special cases. - Ports MxN pooling 2d layer into ckw. - Adds unary function 'negate' to ckw. - Updates pool2d validation tests to include store op. Resovles COMPMID-6263 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: If8c683761fead79bd519aef28cc65de78d3ec629 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10172 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-30Port Resize operator to CKWGunes Bayir
Use Compute Kernel Writer (CKW) to generate code for Resize operator in the Dynamic Fusion interface. Supports Nearest Neighbor and Bilinear interpolation methods. Resolves: COMPMID-6265 Change-Id: Ib0a5158bd4208123c84f6a1dc54f29d82fd55dcd Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10174 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22CPU: Depthwise: Generate correct size for input indirection array.David Mansell
Signed-off-by: David Mansell <David.Mansell@arm.com> Change-Id: I359ed0703f4036e017b34b622f76b630cefac973 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10183 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-22Optimize CpuReshapeKernelAnitha Raj
Resolves COMPMID-5279 Change-Id: Id9b007eed62c200702bbfcc83b94dab7b5de1714 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9962 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-14Update OpenCL headers to v2023.04.17Viet-Hoa Do
Resolves: COMPMID-6452 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0bfab12b79090cea57ab908def9cd5c202ec0a50 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10123 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-10Update Release Notesramy.elgammal@arm.com
Resolves: COMPMID-6179 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I9c3e9ae3ed05010824472cdb86172b5f38d87a8d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10094 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-08Add support for S64 output in NEArgMinMaxLayerPablo Marquez Tello
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32 * We need to call NECast to convert from S32 to S64 * Resolves MLCE-1089 Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-07Port DirectConv2d to CKW backendJakub Sujak
Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation. Support is for FP16/FP32 only. Resolves: COMPMID-6259 Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix CL Tile operatorViet-Hoa Do
Resolves: COMPMID-6404 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I75aebe620567ed50817747589bbe8cfb63715a7b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10036 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-13Added S64/U64 support for the input in CLCastPablo Marquez Tello
* Partially resolves MLCE-1089 Change-Id: Ie3d2fc2f755ae99cdb17b57cc90bb3f99a1843e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9909 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-07Enable transpose convolution with non-square kernelsViet-Hoa Do
Resolves: COMPMID-6319 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I49a17ff973efc88b7ce0334c47ecf076c03f4cc3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9829 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-03Update README for patch release 23.05.1ramy.elgammal@arm.com
Partially Resolves: COMPMID-6328 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I1dd460857da973e65e80326ab7c92b47ba1937eb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9853 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-23Implement FP32/FP16 MatMul NT/T kernel using the MMUL extensionRamy Elgammal
Resolves COMPMID-6195 Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com> Change-Id: I8e85fe73308ed84ebb142d6d6d1562b62dddfaa5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9819 Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-21Fix CPU depthwise convolution in case of large paddingViet-Hoa Do
* Avoid the assembly kernels to be used when the padding is greater than the kernel shape. Resolves: COMPMID-6280 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibe0820018c97f4481bf318397b797ec7b351a1d5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9802 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-06-20Deprecate legacy libarm_compute_coreJakub Sujak
This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users must no longer link their applications to this library and instead link only to the main `libarm_compute` library to maintain full functionality. Resolves: COMPMID-6300 Change-Id: I7c222f511174ce9b0cfde94cf8f90da4a89f2a8b Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9760 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-12Update release notes for the 23.05 release.Omar Al Khatib
Partially resolves: [COMPMID-5888] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I0313af488050d2babb47b3e164c111e11d84021f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9605 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Guards to make NEReorder aarch64 onlyDavid Svantesson
Resolves COMPMID-6151 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I0e8c957f3460633c32ef57be0cdc44a53b8c3e88 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9553 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-01Add Reorder to changelogDavid Svantesson
Partially resolves ONCPUML-1232 Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: I258d03524c50dd24975b473aede061f80bf9d91b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9534 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Add Cropping to CLBatchToSpaceOmar Al Khatib
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance. - Add cropping support and cropping tests Resolves [COMPMID-5865] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-30Add cropping support to NEBatchToSpaceSiCong Li
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance - Add cropping support and cropping tests Resolves COMPMID-5918 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ifea0f5f7760ffd0f4d5d4f3a5ae8d14d0b98b790 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9378 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-14Update release log for 23.02.1 patch releaseJakub Sujak
Partially resolves: COMPMID-5969 Change-Id: I09f41abbda378cd6551bdf5bf4866b2bf4ca3096 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9332 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-21Add Microsoft Windows® trademarksJakub Sujak
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Change-Id: Ie2332d197f274f6f44fa71093f4aff83e91c2a20 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9182 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-02-10Update release version and change log documentationJakub Sujak
Resolves: COMPMID-5565 Change-Id: I9dca679f57f6c3cc9489669b80a5da2aba500d34 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9122 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-10Update release version and change log documentationJakub Sujak
Partially resolves: COMPMID-5565 Change-Id: I058bb7b0ee2ba246bf0f0c67d68e35b57326e802 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9098 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-02Update recommended NDK to r20b in the documentationSiCong Li
Resolves COMPMID-5846 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I0afbcda72879ccf8e9ef486d3c4af7a3c75ba270 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9071 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>