Age | Commit message (Collapse) | Author |
|
* The new softmax implementation consists of only a single kernel.
- There are 2 versions of softmax, one for the x dimension
and one for any other dimensions.
- Softmax kernel handles both native and quantized data type.
Resolves: COMPMID-6447
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch calculates the output quantization info based on the inputs'
quantization information. The previous approach was using the same
quantization information for input, weights and output.
This implementation does not cover the cases where we have fused
activation function.
Note that no Neon™ tests are changed since there were not any quantized
Neon Direct Convolution tests.
Resolves: COMPMID-6485
Change-Id: Id32241320acae0b58552546d6d6f665cd5c63044
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10470
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6397
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Id4404f75dae03fd529db1adac5ab9ca48d08ec46
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10498
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: Ie652203876a0ac12b025e96d20990b6efb21e772
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10477
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: If02f7809f9b6e84979121698c5e7a62cbb41e2c3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10487
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6581
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I0a634e064377e54b9190241c01fc75c212522ba7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10481
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit aeced744b854758768243833bcdf999c0c3c1a5b.
Reason for revert: Incorrect SME architecture checks.
Change-Id: I23fe78178041a544a8791a4655bf6fe4aa375e38
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10501
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6478
Change-Id: I5bc220c3bd00a316776fe14454438cc0dc9049b3
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10469
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Build error when using data_layout_support=nhwc
* Some kernels need to be guarded by ENABLE_NCHW_KERNELS
Change-Id: I0414084b458360c7e8d2842f4734c39aad80852e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10476
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Partially resolves MLCE-1102
Change-Id: If050608e56d75649b8d07757604ae10d6fc4269b
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10461
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I1f73819c25c66e4d13198e9c79755808d92b343d
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10466
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add support for processing left-over vector to comparison kernel.
* Combine native and quantized versions of CL comparison.
Resolves: COMPMID-6424
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I31d43bdf0eab999cee6fa8144b5d8e921a1093e8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10467
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Batch dimension is added to reduction operation.
- All the dimensions higher than the batch dimension are collapsed
so that the input and output tensors are always 3-4D.
- CL kernel is called once instead of being repeatedly called
to process each sliding window.
Resolves: COMPMID-6443
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
"2D" threading mode was not setting the result pointer correctly for
SME2 kernels with K blocking - for non-final blocks the result
pointer should be NULL so that the intermediate results get written
in the accumulator buffer by the kernel.
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: Idefa538e190a086e1e44a91998ab7e949e3989e4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10342
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Build error when using data_layout_support=nhwc
* Some kernels need to be guarded by ENABLE_NCHW_KERNELS
Change-Id: I9fb6cf0e204531f81b0dff3572a1740ba94cde0e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10460
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: ONCPUML-1251
Reorders of block size 8 are only available when SVE is enabled. This
fixes an issue with these reorders being checked even when SVE is not enabled.
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Change-Id: I287763a547d2ab303c3f8ce10164178b680e551e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10464
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Port Matmaul to to Dynamic Fusion.
- Prepare a CKW boilerplate code.
- Implement the following classes:
- MatMulAttributes
- GPUMatMulSettings
- GpuMatMul
- ClComponentMatMul
- GpuCkwMatMul
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I5a7c183b293973e8a4233b554b2affe0bb28f44d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10453
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Optimize the stack operation in Cpu by leveraging block memcpy.
Resolves: COMPMID-6498
Change-Id: I49d79d179f0375a73d654edd59fb33072112569b
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10451
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6466
Change-Id: I916871c9d7880107985337782e2cfd280a62cdeb
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10458
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6583
Change-Id: Icb760b57df1e4573e5009e41ee13ada869acc687
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10457
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6465
Change-Id: I5bbf4596dd5e34e806dc51de9be14df9b6fa320a
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10452
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: I497fe0ba6e84493a5072c3e80bbba7ecd5de8095
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10448
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- For FP16, disable direct Deconv method when oiutput channel count is
greater than 32.
Resolves COMPMID-6311
Change-Id: I14d9dbf1a1b95736ccd09488d633df4775a01dcb
Signed-off-by: Sangwon Ha <sangwon.ha@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10446
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be moved from src/cpu/kernels/pool2d/neon/nchw/all.cpp
to src/cpu/kernels/pool2d/neon/fp16.cpp.
* In src/cpu/kernels/pool2d/neon/list.h when we declare the kernels
we need to remove defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) so that
in std::vector<CpuPool2dKernel::PoolingKernel> available_kernels
* Partially resolves MLCE-1102
Change-Id: I000380f8eccca17e6219c4f3453980d67a2c9dd8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10444
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Transpose higher dimensional tensors (>2D) by collapsing higher
dimensions into the third dimension thus avoiding multiple dispatches
of the CL kernel
* Maximize tile size without register spilling
Resolves: COMPMID-6448
Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Resolves COMPMID-6560
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I5778a0d620bed3fea05d868c0a775fbaa6dcc9dd
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10442
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Only support 1x1 blocks, i.e. n0=1, m0=1.
- Dilation not supported yet.
Resolves: COMPMID-6258
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Added a new test to make sure we support the following configuration:
NCHW
InputInfo=Shape=2,2
WeightsInfo=Shape=3,3
OutputInfo=Shape=4,4,
PadStrideInfo=1,1;0,0,0,0'
* Fixed the validate() method to allow this configuration
* Resolves MLCE-1120
Change-Id: I6874ad57bb81384185984741b983bf5e19ba150c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10417
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fix the reference axis vector to be the right size.
- Update typos in the error messages.
Resolves COMPMID-6574
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I9572365b8173b92d0fffd557e4db261b2969109c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10423
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
|
|
The linker option '-noall_load' is obsolete.
Resolves: COMPMID-6578
Change-Id: I2d386c25b50705a25600dd612406470e1d3871be
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10421
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6574
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I6b23e2a2f7b2839f038dad538dfc5ebda62891a6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10412
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
|
|
Several test optimizations have been introduced into Winograd tests for Gpu and Cpu backends. The testing strategy has been detailed as a comment header in the test design files.
In summary
- Very large shapes in the nightly are made smaller
- If the underlying kernel is the same for different data types, we only need to stress some key aspects of the kernels (e.g. read/write lengths in case of fp32/fp16).
- In case the underlying kernel is the same (OpenCL), Fp16 is tested on a subset of the shapes
- In Cpu, there is no need to test every combination for both NCHW and NHWC as we just permute the inputs and use NHWC kernels anyways
- All activations does not need to be tested for each and every shape
Resolves: COMPMID-6464
Change-Id: Ie25fded85c65b9c7386dc21b23f9b695b1e77b07
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10393
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6476, COMPMID-6477
Change-Id: Ied37c269d5a108ff72f70e3ad932cf372bda5562
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10346
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Clang-format options now match those in clang-format version 14.
Remove Astyle checks as the same code style checks are provided by clang-format.
Resolves: COMPMID-6576
Change-Id: Iefa9bb719826242a3276e9ca058d0c84624f7302
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10399
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6474
Change-Id: Iaff5b512cf77975f2df02dcdf848711b13bf97a6
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10341
Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6576
Change-Id: I91ec2afd68d19e26f4c31a978176f1fdf4b7f6cc
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10400
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* The current implementation has signfinicant inaccuracy
and the issue cascades to GELU.
* Use the implementation from Arm® Optimized Routines.
The maximum error is 1.93 ULP.
Resolves: COMPMID-6554
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: If80131e164b7a078e34dd8e05b1506698f31d17a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10395
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I460cf3e47582658a1ee03b769a1ad47426630dcb
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10394
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Code is formatted as per a revised clang format configuration
file(not part of this delivery). Version 14.0.6 is used.
Exclusion List:
- files with .cl extension
- files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...)
And the following directories
- compute_kernel_writer/validation/
- tests/
- include/
- src/core/NEON/kernels/convolution/
- src/core/NEON/kernels/arm_gemm/
- src/core/NEON/kernels/arm_conv/
- data/
There will be a follow up for formatting of .cl files and the
files under tests/ and compute_kernel_writer/validation/.
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
- Add support for negative axis values.
- Add option to use opposite ACL convention for dimension addressing.
- Add validation tests for the mentioned additions.
Resolves COMPMID-6497
Change-Id: I9174b201c3adc070766cc6cffcbe4ec1fe5ec1c3
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10335
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6458
Change-Id: I1068da3dee6b6f58e4179f5a92521a6d6457e6c4
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10380
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Inclusion order of header is changed as preparatory step
for applying clang-format
Change-Id: I0c529f896ba802dfc6f30a573cdc9d9a24f3081c
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10379
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template select_op() which had to be moved from impl.cpp to fp16.cpp
* Partially resolves MLCE-1102
Change-Id: Ic9e73e121482fcc5e4fcbe8ae1ecd23649cbd3d1
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10359
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template max_unpooling() which had to be moved from impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: Iabf9a9ba9d2441032f931f33aad97acc3e332575
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10362
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
Signed-off-by: Paolo Tricerri <paolo.tricerri@arm.com>
Change-Id: If4e2944e25e48c8b7a1a6713e57838d449a987ea
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10366
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add the concept of tile view which refers to a specific rectangular
area of the tile object.
- The active area is added to TileOperand so that the user can access
part of the tile.
- Currently only row vector and scalar access are exposed to the user.
- All writing operations except load/store op support sub-tile.
* Add tests for sub-tile access.
Resolves: COMPMID-6557
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ica3f9eaf17f06e080c495d36c572f623b62c2910
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10354
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Switch the loop unrolling order and reuse the pre-computed vectors
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I636c0530d6b21dae4dbb371c57d18b1f7c7246a8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10355
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template l2_normalize_x() and
l2_normalize_yz which had to be moved from impl.cpp to impl.h
* Removed impl.cpp
* Partially resolves MLCE-1102
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Change-Id: Id00a823730108293fc712295a178dad80588af30
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10344
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the templates vector_matrix_multiply_f16() and
matrix_matrix_multiply_f16 which had to be moved from impl.cpp to fp16.cpp
* Partially resolves MLCE-1102
Change-Id: Ic87440797d6f1653c815ab6565972206f5afd0ad
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10345
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|