Age | Commit message (Collapse) | Author |
|
Resolves: COMPMID-6501
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I0abd3cbb5f861301f407c443988fb7efaa205b5d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11056
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Co-authored-by: Michael Kozlov <michael.kozlov@arm.com>
Signed-off-by: Michael Kozlov <michael.kozlov@arm.com>
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Change-Id: I40b68789fd363e0a15a0939550df8b067920ad6e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11280
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Indirect GEMM uses optimized assembly path while Direct Conv uses the fallback Acl kernel for convolution.
In certain cases, where the input tensor is large and filter size is greater than 7 (e.g. 9x9 filters), heuristics fall back to Direct Conv algorithm where it could still prefer the assembly path if the data layout is NHWC. This is more important when SME2 kernels are present.
Resolves: COMPMID-6900
Change-Id: Ia611c975eee0423615113fcaeaa8f9eef0421456
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11254
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch fuses the transposition taking place in Acl with the transformations done in arm_gemm (called pretranspose_b_array) if the underlying kernel and transform supports it. This should improve start-up time (as it's for constant Rhs matrices) and memory footprint. The transformations in arm_gemm are kernel specific. The Rhs matrix is transformed into certain layouts to improve the performance.
Resolves: COMPMID-6595
Change-Id: Id2932dd966e59f903c279417bebcea83d9a42464
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11144
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Change-Id: I12511d9fb799a11fc9a889aeb0ad1739b9d1954d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11121
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
This patch also fixes a bug where the split dimension was wrong in
CpuDepthwiseConv2dAssemblyDispatch::run. It was set to DimY, which is
cols, but it should have been DimZ. This was rarely an issue in practice
because typically the number of cols are greater than the number of
threads anyway.
Relates to: ONCPUML-1443
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: Ifed2fce22ddeb7cd77e6a6ae1083694427f91e04
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11083
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add python script to update the Supported Operators documentation page
* Add a simple shell script to build Doxygen pages
Towards: COMPMID-6630
Change-Id: I8f29c7c3c54d5aa56af0fbc6bede03813df06aaa
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10836
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Documented the use of the compiler directive .inst
* Updated the multi_isa section
* Resolves MLCE-1156
Change-Id: I6a04ac66bc244c3adc010e1f8545f2045992d6db
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10981
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Enables FP16 lut for logistic activation
- Adds LUTManager to re-use lut where appropriate.
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues.
Resolves: COMPMID-6766
Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6622
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e.
softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) )
This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor.
It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs.
It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future.
Resolves: COMPMID-6500
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* This is the initial patch to start working on enabling fp16 in all
multi_isa builds. More changes are required in the way we register
the kernels using the macro REGISTER_FP16_NEON.
* In this patch we add the capability to build the fp16 files in listed in
filelist.json with the correct arch option to enable FP16
* This patch is required towards building an universal multi_isa binary
where fp16 is enable.
* Enable REGISTER_FP16_NEON macro for all builds by removing
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC guard from the macro definition.
The macro has to be used across all types of builds.
Change-Id: I99f4c273f6ee04cad3c097e5e374200f48568fa9
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10682
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Stop building and linking to the legacy libarm_compute_core artifact.
This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users should link only to the main libarm_compute library for core functionality.
Resolves: COMPMID-6329
Change-Id: Ife9d2c25d275e7c676deb09632ae461f697efde9
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10728
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Reviewed-by: Sang Won Ha <sangwon.ha@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6369
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Change-Id: I997a48c4e8efb67fbe53efd6fc498df4e6b41eea
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10726
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6633
Change-Id: I1e78df468876ec3569fa46597734e7de328b06f4
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10663
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6471
Change-Id: I5add2af4292ff2eeafcde85f3bff8e98b2069b13
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10660
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
When weight has no holes, we can replace CpuWeightsReshapeKernel with:
- Collapse by reinterpreting weight's 3 spatial dimensions
- Perform CpuTranspose
For more details see the documentation in
src/cpu/operators/CpuGemmConv2d.cpp
This is one optimization since the CpuTranspose is better performing
than CpuWeightsReshapeKernel
A second optimization is to fuse this transpose with other weight
transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch)
However this second optimization depends on how the underlying gemm
methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly
path: CpuGemmAssemblyDispatch) chooses to fuse the transpose.
Therefore, this patch moves the transpose down from CpuGemmConv2d, to
the individual gemm operators where the fusion decision needs to be
made, by passing an extra "transpose_b" flag to CpuGemm
New transpose_b flag in different scopes (they are all the same, but
with different names because pretranspose_b has a different meaning in
GemmAssemblyDispatch):
GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b
New auxilliary tensors holding the transposed b result:
- CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB
- CpuGemm fallback path: CpuGemm::PreTransposedRHS
Note that this patch does not yet have the second optimization
(COMPMID-6595), but it prepares for it.
Relates to COMPMID-6595
Resolves COMPMID-6499
Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6599
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Id91185871f0dc30c08c7c38379acd5a3c1056473
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10575
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds option to use negative axis and inverted axis.
- Adds validation tests for the above.
Resolves COMPMID-6459
Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add the kernel variant: (nt_t) to GpuCKWMatMul.
- Extend CKW MatMul validation test with nt_t.
- Fixes a bug in CKW where z-dim = 1.
Resolves: COMPMID-6435
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I4c5e8791e55f21ffff3c11eca7802c51a4259977
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10525
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* The new softmax implementation consists of only a single kernel.
- There are 2 versions of softmax, one for the x dimension
and one for any other dimensions.
- Softmax kernel handles both native and quantized data type.
Resolves: COMPMID-6447
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Batch dimension is added to reduction operation.
- All the dimensions higher than the batch dimension are collapsed
so that the input and output tensors are always 3-4D.
- CL kernel is called once instead of being repeatedly called
to process each sliding window.
Resolves: COMPMID-6443
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Optimize the stack operation in Cpu by leveraging block memcpy.
Resolves: COMPMID-6498
Change-Id: I49d79d179f0375a73d654edd59fb33072112569b
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10451
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Transpose higher dimensional tensors (>2D) by collapsing higher
dimensions into the third dimension thus avoiding multiple dispatches
of the CL kernel
* Maximize tile size without register spilling
Resolves: COMPMID-6448
Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Only support 1x1 blocks, i.e. n0=1, m0=1.
- Dilation not supported yet.
Resolves: COMPMID-6258
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add support for negative axis values.
- Add option to use opposite ACL convention for dimension addressing.
- Add validation tests for the mentioned additions.
Resolves COMPMID-6497
Change-Id: I9174b201c3adc070766cc6cffcbe4ec1fe5ec1c3
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10335
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Two implementations of the command buffer are added:
- CLMutableCommandBuffer uses mutable dispatch command buffer
extension.
- CLCompatCommandBuffer is the compatibility class for platform
without the CL extension.
Resolves: COMPMID-6454
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I15b370a50168ca940bd8fb2b5fae26230da3f472
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10298
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-6212]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
The skeleton code consists of modifications
- to build the library with the quantized matmul kernel
- refactoring of some common utilities
- empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t])
- some validation tests and skeleton for functional tests
Resolves: COMPMID-6473
Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
PostOps was the experimental interface for Dynamic Fusion. It is now
replaced by the new Dynamic Fusion interface with code generation using
the Compute Kernel Writer.
Resolves: COMPMID-6190
Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fixes a bug when using FP16 constant in some cases.
- Adds op_write_raw_code to handle some special cases.
- Ports MxN pooling 2d layer into ckw.
- Adds unary function 'negate' to ckw.
- Updates pool2d validation tests to include store op.
Resovles COMPMID-6263
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: If8c683761fead79bd519aef28cc65de78d3ec629
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10172
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Use Compute Kernel Writer (CKW) to generate code for Resize operator in
the Dynamic Fusion interface.
Supports Nearest Neighbor and Bilinear interpolation methods.
Resolves: COMPMID-6265
Change-Id: Ib0a5158bd4208123c84f6a1dc54f29d82fd55dcd
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10174
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: I359ed0703f4036e017b34b622f76b630cefac973
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10183
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5279
Change-Id: Id9b007eed62c200702bbfcc83b94dab7b5de1714
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9962
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Report the issue of SVE2 build crashes when run on an SVE only device.
Resolves: COMPMID-6178
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I4e9e63b6064d4d5db9f8f3b99a38ec60828e9607
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10141
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6452
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I0bfab12b79090cea57ab908def9cd5c202ec0a50
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10123
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6179
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I202b0ad66cebfebe2a07766b068a211a7597f035
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10097
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6179
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I9c3e9ae3ed05010824472cdb86172b5f38d87a8d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10094
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32
* We need to call NECast to convert from S32 to S64
* Resolves MLCE-1089
Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch
- includes our code formatting scripts used in our precommit pipeline
- sets up pre-commit framework to help contributors validate their patches
This has several benefits:
- our repository becomes more inclusive to external contributions
- we can use several hooks available online efficiently, w/o implementing our own
- it speeds up our development flow
and, it is completely optional.
The pre-commit configuration includes running the following:
- our code formatting scripts
- CMake and Bazel build file generation scripts
- hooks that check trailing whitespace, end of files, committed large files etc.
The number of checks can easily be extended using pre-commit framework.
Change-Id: I06bf1259715579d321f89820726a8301504c36d9
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10064
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation.
Support is for FP16/FP32 only.
Resolves: COMPMID-6259
Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a new section in the documentation to describe how the conv2D
heuristic works on Arm® Cortex®-based CPUs and Arm® Mali™-based GPUs
- Add CKW_UNUSED in compute_kernel_writer/src/cl/CLTile.cpp to avoid
the compilation error due to an unused variable
- Remove FFT from the list of algorithms to be selected by the CPU Conv2d
heuristic.
Resolves COMPMID-6163
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Change-Id: I51384d7749451b2562642683e8b2429a355166bb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10065
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6404
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I75aebe620567ed50817747589bbe8cfb63715a7b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10036
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Some symbols have been moved from core/Types.h. This patch retains
back compatibility so that the user can still include this header
for those symbols
* A new header core/CoreTypes.h is created to avoid circular dependency.
This header includes essential small types that are used across
functions
* Move all function info types into function_info folder for easier
tracking
Resolves COMPMID-6330
Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Partially resolves MLCE-1089
Change-Id: Ie3d2fc2f755ae99cdb17b57cc90bb3f99a1843e0
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9909
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Only kernel sizes supported are 2, 4, 8 and 16.
* Resolves COMPMID-6349
Change-Id: I30c85dcb3505d47fe56a2d2a08e9221ff426ee93
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9890
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6319
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I49a17ff973efc88b7ce0334c47ecf076c03f4cc3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9829
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves: COMPMID-6328
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I1dd460857da973e65e80326ab7c92b47ba1937eb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9853
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6195
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I8e85fe73308ed84ebb142d6d6d1562b62dddfaa5
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9819
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|