Age | Commit message (Collapse) | Author |
|
When weight has no holes, we can replace CpuWeightsReshapeKernel with:
- Collapse by reinterpreting weight's 3 spatial dimensions
- Perform CpuTranspose
For more details see the documentation in
src/cpu/operators/CpuGemmConv2d.cpp
This is one optimization since the CpuTranspose is better performing
than CpuWeightsReshapeKernel
A second optimization is to fuse this transpose with other weight
transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch)
However this second optimization depends on how the underlying gemm
methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly
path: CpuGemmAssemblyDispatch) chooses to fuse the transpose.
Therefore, this patch moves the transpose down from CpuGemmConv2d, to
the individual gemm operators where the fusion decision needs to be
made, by passing an extra "transpose_b" flag to CpuGemm
New transpose_b flag in different scopes (they are all the same, but
with different names because pretranspose_b has a different meaning in
GemmAssemblyDispatch):
GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b
New auxilliary tensors holding the transposed b result:
- CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB
- CpuGemm fallback path: CpuGemm::PreTransposedRHS
Note that this patch does not yet have the second optimization
(COMPMID-6595), but it prepares for it.
Relates to COMPMID-6595
Resolves COMPMID-6499
Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6479
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Change-Id: I13aa0ef944a75ba8b5e4df183d52df57b9aba90f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10659
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Enable use_dynamic_shape ArithmeticDivisionDynamicShapeValidationFixture
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Change-Id: I42ddf5b604d728eda91fa45b239abf8caf2cda0f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10586
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch calculates the output quantization info based on the inputs'
quantization information.
The previous approach was using the same quantization information for
input, weights and output.
This implementation does not cover the cases where we have fused
activation function.
Resolves: [COMPMID-6484]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ib58143165191e82ae8547e661ac7c8d077bda200
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10539
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6585
Change-Id: Ibfc5412ab23f10026c872eab99a056eb682d77a1
Signed-off-by: Sangwon Ha <sangwon.ha@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10577
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6599
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Id91185871f0dc30c08c7c38379acd5a3c1056473
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10575
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fix SVE header being included in non-SVE file.
Resolves: COMPMID-6613
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ic7f662a239b761b83e67e11b6cc03f7d5f5cd051
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10573
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds option to use negative axis and inverted axis.
- Adds validation tests for the above.
Resolves COMPMID-6459
Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve the following clang-tidy errors:
* use of undeclared identifier 'ARM_COMPUTE_ASSERT'
* no template named 'AbsoluteTolerance'
* no template named 'RelativeTolerance'
These errors are a result of missing include headers in test fixtures.
Resolves: COMPMID-6604
Change-Id: I8058c5848bb52a44925b2f99c9e8edf84dc79acc
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10561
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch calculates the output quantization info based on the inputs'
quantization information.
The previous approach was using the same quantization information for
input, weights and output.
This implementation does not cover the cases where we have fused
activation function.
Resolves: COMPMID-6482
Change-Id: I4a9d87cfef8ad18ef241d457d23f44c8519a1389
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10541
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add the kernel variant: (nt_t) to GpuCKWMatMul.
- Extend CKW MatMul validation test with nt_t.
- Fixes a bug in CKW where z-dim = 1.
Resolves: COMPMID-6435
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I4c5e8791e55f21ffff3c11eca7802c51a4259977
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10525
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6493
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I038d91ba266e1e8bf124336bcd272ec77e92038c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10490
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* The new softmax implementation consists of only a single kernel.
- There are 2 versions of softmax, one for the x dimension
and one for any other dimensions.
- Softmax kernel handles both native and quantized data type.
Resolves: COMPMID-6447
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch calculates the output quantization info based on the inputs'
quantization information. The previous approach was using the same
quantization information for input, weights and output.
This implementation does not cover the cases where we have fused
activation function.
Note that no Neon™ tests are changed since there were not any quantized
Neon Direct Convolution tests.
Resolves: COMPMID-6485
Change-Id: Id32241320acae0b58552546d6d6f665cd5c63044
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10470
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6397
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Id4404f75dae03fd529db1adac5ab9ca48d08ec46
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10498
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: Ie652203876a0ac12b025e96d20990b6efb21e772
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10477
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: If02f7809f9b6e84979121698c5e7a62cbb41e2c3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10487
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6581
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I0a634e064377e54b9190241c01fc75c212522ba7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10481
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit aeced744b854758768243833bcdf999c0c3c1a5b.
Reason for revert: Incorrect SME architecture checks.
Change-Id: I23fe78178041a544a8791a4655bf6fe4aa375e38
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10501
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6478
Change-Id: I5bc220c3bd00a316776fe14454438cc0dc9049b3
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10469
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Build error when using data_layout_support=nhwc
* Some kernels need to be guarded by ENABLE_NCHW_KERNELS
Change-Id: I0414084b458360c7e8d2842f4734c39aad80852e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10476
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Partially resolves MLCE-1102
Change-Id: If050608e56d75649b8d07757604ae10d6fc4269b
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10461
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I1f73819c25c66e4d13198e9c79755808d92b343d
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10466
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add support for processing left-over vector to comparison kernel.
* Combine native and quantized versions of CL comparison.
Resolves: COMPMID-6424
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I31d43bdf0eab999cee6fa8144b5d8e921a1093e8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10467
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Batch dimension is added to reduction operation.
- All the dimensions higher than the batch dimension are collapsed
so that the input and output tensors are always 3-4D.
- CL kernel is called once instead of being repeatedly called
to process each sliding window.
Resolves: COMPMID-6443
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
"2D" threading mode was not setting the result pointer correctly for
SME2 kernels with K blocking - for non-final blocks the result
pointer should be NULL so that the intermediate results get written
in the accumulator buffer by the kernel.
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: Idefa538e190a086e1e44a91998ab7e949e3989e4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10342
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Build error when using data_layout_support=nhwc
* Some kernels need to be guarded by ENABLE_NCHW_KERNELS
Change-Id: I9fb6cf0e204531f81b0dff3572a1740ba94cde0e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10460
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: ONCPUML-1251
Reorders of block size 8 are only available when SVE is enabled. This
fixes an issue with these reorders being checked even when SVE is not enabled.
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Change-Id: I287763a547d2ab303c3f8ce10164178b680e551e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10464
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Port Matmaul to to Dynamic Fusion.
- Prepare a CKW boilerplate code.
- Implement the following classes:
- MatMulAttributes
- GPUMatMulSettings
- GpuMatMul
- ClComponentMatMul
- GpuCkwMatMul
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I5a7c183b293973e8a4233b554b2affe0bb28f44d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10453
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Optimize the stack operation in Cpu by leveraging block memcpy.
Resolves: COMPMID-6498
Change-Id: I49d79d179f0375a73d654edd59fb33072112569b
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10451
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6466
Change-Id: I916871c9d7880107985337782e2cfd280a62cdeb
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10458
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6583
Change-Id: Icb760b57df1e4573e5009e41ee13ada869acc687
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10457
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6465
Change-Id: I5bbf4596dd5e34e806dc51de9be14df9b6fa320a
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10452
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: I497fe0ba6e84493a5072c3e80bbba7ecd5de8095
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10448
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- For FP16, disable direct Deconv method when oiutput channel count is
greater than 32.
Resolves COMPMID-6311
Change-Id: I14d9dbf1a1b95736ccd09488d633df4775a01dcb
Signed-off-by: Sangwon Ha <sangwon.ha@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10446
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be moved from src/cpu/kernels/pool2d/neon/nchw/all.cpp
to src/cpu/kernels/pool2d/neon/fp16.cpp.
* In src/cpu/kernels/pool2d/neon/list.h when we declare the kernels
we need to remove defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) so that
in std::vector<CpuPool2dKernel::PoolingKernel> available_kernels
* Partially resolves MLCE-1102
Change-Id: I000380f8eccca17e6219c4f3453980d67a2c9dd8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10444
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Transpose higher dimensional tensors (>2D) by collapsing higher
dimensions into the third dimension thus avoiding multiple dispatches
of the CL kernel
* Maximize tile size without register spilling
Resolves: COMPMID-6448
Change-Id: Iac094b8c428bdf319d9c28a8334cb55d58e2d14b
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10443
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Resolves COMPMID-6560
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I5778a0d620bed3fea05d868c0a775fbaa6dcc9dd
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10442
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Only support 1x1 blocks, i.e. n0=1, m0=1.
- Dilation not supported yet.
Resolves: COMPMID-6258
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Added a new test to make sure we support the following configuration:
NCHW
InputInfo=Shape=2,2
WeightsInfo=Shape=3,3
OutputInfo=Shape=4,4,
PadStrideInfo=1,1;0,0,0,0'
* Fixed the validate() method to allow this configuration
* Resolves MLCE-1120
Change-Id: I6874ad57bb81384185984741b983bf5e19ba150c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10417
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fix the reference axis vector to be the right size.
- Update typos in the error messages.
Resolves COMPMID-6574
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I9572365b8173b92d0fffd557e4db261b2969109c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10423
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
|
|
The linker option '-noall_load' is obsolete.
Resolves: COMPMID-6578
Change-Id: I2d386c25b50705a25600dd612406470e1d3871be
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10421
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6574
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I6b23e2a2f7b2839f038dad538dfc5ebda62891a6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10412
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
|
|
Several test optimizations have been introduced into Winograd tests for Gpu and Cpu backends. The testing strategy has been detailed as a comment header in the test design files.
In summary
- Very large shapes in the nightly are made smaller
- If the underlying kernel is the same for different data types, we only need to stress some key aspects of the kernels (e.g. read/write lengths in case of fp32/fp16).
- In case the underlying kernel is the same (OpenCL), Fp16 is tested on a subset of the shapes
- In Cpu, there is no need to test every combination for both NCHW and NHWC as we just permute the inputs and use NHWC kernels anyways
- All activations does not need to be tested for each and every shape
Resolves: COMPMID-6464
Change-Id: Ie25fded85c65b9c7386dc21b23f9b695b1e77b07
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10393
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6476, COMPMID-6477
Change-Id: Ied37c269d5a108ff72f70e3ad932cf372bda5562
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10346
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Clang-format options now match those in clang-format version 14.
Remove Astyle checks as the same code style checks are provided by clang-format.
Resolves: COMPMID-6576
Change-Id: Iefa9bb719826242a3276e9ca058d0c84624f7302
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10399
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6474
Change-Id: Iaff5b512cf77975f2df02dcdf848711b13bf97a6
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10341
Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6576
Change-Id: I91ec2afd68d19e26f4c31a978176f1fdf4b7f6cc
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10400
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* The current implementation has signfinicant inaccuracy
and the issue cascades to GELU.
* Use the implementation from Arm® Optimized Routines.
The maximum error is 1.93 ULP.
Resolves: COMPMID-6554
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: If80131e164b7a078e34dd8e05b1506698f31d17a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10395
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|