Age | Commit message (Collapse) | Author |
|
* Changes in filelist.json: moved fp16 code from common to fp16
* Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
with ENABLE_FP16_KERNELS.
* Resolves COMPMID-6755
Change-Id: I4da1c53d3f9e4734e5e67125265ab4e3fc0dcbe4
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10865
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab.
Resolves COMPMID-6735
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves ONCPUML-1331
This patch adds an option to make _custom_scheduler thread_local to
support usage of multiple schedulers handled outside of ACL.
It also adds num_threads() function to Scheduler which reverts to
querying CPUInfo if no scheduler has been set.
Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912
Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
When weight has no holes, we can replace CpuWeightsReshapeKernel with:
- Collapse by reinterpreting weight's 3 spatial dimensions
- Perform CpuTranspose
For more details see the documentation in
src/cpu/operators/CpuGemmConv2d.cpp
This is one optimization since the CpuTranspose is better performing
than CpuWeightsReshapeKernel
A second optimization is to fuse this transpose with other weight
transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch)
However this second optimization depends on how the underlying gemm
methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly
path: CpuGemmAssemblyDispatch) chooses to fuse the transpose.
Therefore, this patch moves the transpose down from CpuGemmConv2d, to
the individual gemm operators where the fusion decision needs to be
made, by passing an extra "transpose_b" flag to CpuGemm
New transpose_b flag in different scopes (they are all the same, but
with different names because pretranspose_b has a different meaning in
GemmAssemblyDispatch):
GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b
New auxilliary tensors holding the transposed b result:
- CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB
- CpuGemm fallback path: CpuGemm::PreTransposedRHS
Note that this patch does not yet have the second optimization
(COMPMID-6595), but it prepares for it.
Relates to COMPMID-6595
Resolves COMPMID-6499
Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Code is formatted as per a revised clang format configuration
file(not part of this delivery). Version 14.0.6 is used.
Exclusion List:
- files with .cl extension
- files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...)
And the following directories
- compute_kernel_writer/validation/
- tests/
- include/
- src/core/NEON/kernels/convolution/
- src/core/NEON/kernels/arm_gemm/
- src/core/NEON/kernels/arm_conv/
- data/
There will be a follow up for formatting of .cl files and the
files under tests/ and compute_kernel_writer/validation/.
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
- That would force CpuConv2d::get_convolution_method()
choose GEMM_CONV2D or GEMM methods instead.
Resolves: COMPMID-5531
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I4dec8772e8c150da003d9a89c1d036057c4d28b0
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8233
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
* For 3x3 kernel, only choose the implementation with larger tile
size if the input tensor is larger than the tile.
Resolves: COMPMID-5467
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I2cf95ddb25f477cb05da3b3501e0afe9548fc33a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8022
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5400
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib4428436dd7a6e40d8b2d8a2f8dac1b079154551
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7894
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves: COMPMID-4718
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I02eabdd6bce8cd561ab2fdfd644a686a3762b817
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6253
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Legacy structure contained two libraries core/runtime with two backends
in each.
We reduce the core/runtime libraries to a single library thus merging
the backend files
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|