Age | Commit message (Collapse) | Author |
|
- Enables FP16 lut for logistic activation
- Adds LUTManager to re-use lut where appropriate.
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues.
Resolves: COMPMID-6766
Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6622
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Moved NCHW kernels fp16 and fp32 to their corresponding files
src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and
src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp
* Changes in filelist.json to include the new fp16 and fp32 files
* Moved the template batch_normalization_nchw to impl.h as we
need to instantiate it from fp16.cpp and fp32.cpp
* Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that
prevented the FP16 kernel execution.
* Partially resolves MLCE-1102
Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Moved fp16 and fp32 to their corresponding files
src/cpu/kernels/mul/generic/neon/fp16.cpp and
src/cpu/kernels/mul/generic/neon/fp32.cpp
* Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels
* Partially resolves MLCE-1102
Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Moved the template arm_compute::normalize_float to impl.h because
we need to instantiate it from both NENormalizationLayerKernel.cpp
and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp
* Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels
* Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in
NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that
the fp16 kernels can be compiled in for multi_isa builds
* Moved fp32 kernels to the corresponding file
src/cpu/kernels/norm_layer/generic/neon/fp32.cpp
* Partially resolves MLCE-1102
Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* The new softmax implementation consists of only a single kernel.
- There are 2 versions of softmax, one for the x dimension
and one for any other dimensions.
- Softmax kernel handles both native and quantized data type.
Resolves: COMPMID-6447
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I4a9ae5bc63f78aebeaa85ee48a0d102c9c245eda
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10489
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: Iab9c29dbfd89358f2f663862ff5010c88aeccf8c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10496
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: If02f7809f9b6e84979121698c5e7a62cbb41e2c3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10487
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit aeced744b854758768243833bcdf999c0c3c1a5b.
Reason for revert: Incorrect SME architecture checks.
Change-Id: I23fe78178041a544a8791a4655bf6fe4aa375e38
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10501
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I1f73819c25c66e4d13198e9c79755808d92b343d
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10466
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Port Matmaul to to Dynamic Fusion.
- Prepare a CKW boilerplate code.
- Implement the following classes:
- MatMulAttributes
- GPUMatMulSettings
- GpuMatMul
- ClComponentMatMul
- GpuCkwMatMul
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I5a7c183b293973e8a4233b554b2affe0bb28f44d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10453
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* FP16 kernels must be instantiated in fp16.cpp.
* Partially resolves MLCE-1102
Change-Id: I497fe0ba6e84493a5072c3e80bbba7ecd5de8095
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10448
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Only support 1x1 blocks, i.e. n0=1, m0=1.
- Dilation not supported yet.
Resolves: COMPMID-6258
Signed-off-by: ramy.elgammal@arm.com <ramy.elgammal@arm.com>
Change-Id: I1dcfd7640fb40e112736dedc81847f7b1b50dba2
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10411
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* The current implementation has signfinicant inaccuracy
and the issue cascades to GELU.
* Use the implementation from Arm® Optimized Routines.
The maximum error is 1.93 ULP.
Resolves: COMPMID-6554
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: If80131e164b7a078e34dd8e05b1506698f31d17a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10395
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template select_op() which had to be moved from impl.cpp to fp16.cpp
* Partially resolves MLCE-1102
Change-Id: Ic9e73e121482fcc5e4fcbe8ae1ecd23649cbd3d1
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10359
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template max_unpooling() which had to be moved from impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: Iabf9a9ba9d2441032f931f33aad97acc3e332575
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10362
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template l2_normalize_x() and
l2_normalize_yz which had to be moved from impl.cpp to impl.h
* Removed impl.cpp
* Partially resolves MLCE-1102
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Change-Id: Id00a823730108293fc712295a178dad80588af30
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10344
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Two implementations of the command buffer are added:
- CLMutableCommandBuffer uses mutable dispatch command buffer
extension.
- CLCompatCommandBuffer is the compatibility class for platform
without the CL extension.
Resolves: COMPMID-6454
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I15b370a50168ca940bd8fb2b5fae26230da3f472
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10298
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-6212]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template fused_batch_normalization_dwc_nhwc() that
had to be moved from impl.cpp to impl.h
* Removed impl.cpp
* Partially resolves MLCE-1102
Change-Id: Idaaa113c71729e32e565acf5fb5694c76c36d76d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10308
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The skeleton code consists of modifications
- to build the library with the quantized matmul kernel
- refactoring of some common utilities
- empty OpenCL Kernels for four configurations ([Lhs, Rhs] X [Nt, t])
- some validation tests and skeleton for functional tests
Resolves: COMPMID-6473
Change-Id: Id8401f789d34277dceb1f91afd68c9c88275618a
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10273
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template in_bounds_crop_window so it had to be moved from
impl.cpp to impl.h
* Removed the file src/cpu/kernels/crop/generic/neon/impl.cpp
* Partially resolves MLCE-1102
Change-Id: I1953849153e672ff7938f54c877c7498117dcca4
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10282
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* Partially resolves MLCE-1102
Change-Id: I5ecfc8f6c0d84f92d80bec2cde6e7338794b9788
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10240
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a test case with src and dst having same row size
- Remove inline from has_holes() util function
Related to COMPMID-6504
Change-Id: Iead1f17692dc57b66c5d9f01eed30169efaee0a5
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10190
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
PostOps was the experimental interface for Dynamic Fusion. It is now
replaced by the new Dynamic Fusion interface with code generation using
the Compute Kernel Writer.
Resolves: COMPMID-6190
Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use various templates that had to be moved from
impl.cpp to impl.h
* Removed src/cpu/kernels/pool3d/neon/impl.cpp
* Partially resolves MLCE-1102
Change-Id: I71e6a54a27fd8f04ae2a67231709aad723b09fa3
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10220
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fixes a bug when using FP16 constant in some cases.
- Adds op_write_raw_code to handle some special cases.
- Ports MxN pooling 2d layer into ckw.
- Adds unary function 'negate' to ckw.
- Updates pool2d validation tests to include store op.
Resovles COMPMID-6263
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: If8c683761fead79bd519aef28cc65de78d3ec629
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10172
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Use Compute Kernel Writer (CKW) to generate code for Resize operator in
the Dynamic Fusion interface.
Supports Nearest Neighbor and Bilinear interpolation methods.
Resolves: COMPMID-6265
Change-Id: Ib0a5158bd4208123c84f6a1dc54f29d82fd55dcd
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10174
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template roi_align() so it had to be moved from
impl.cpp to impl.h
* Removed the file src/cpu/kernels/roialign/generic/neon/impl.cpp
* Partially resolves MLCE-1102
Change-Id: If78371479042725723cea6f6c65aac76d68a1c1d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10213
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Enable fp16 in armv8a multi_isa builds
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template add_same_neon() so it had to be moved from
impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: Ia51007f5e663b708071958bb94bfab4535e4b2f8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10191
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation.
Support is for FP16/FP32 only.
Resolves: COMPMID-6259
Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* The kernel now supports the following conversions:
S64 -> F32
U64 -> F32
* Resolves MLCE-1089
Change-Id: I277cf58b78d919fde25947520d2056e1412c7f82
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9935
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Makes a small difference to compile times and opens up other opportunities
to simplify code.
Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6257
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I3e56ff1f1109924da02d0abd0354a3f1fa095ee7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9914
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Nikolaj Jensen <nikolaj.jensen@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6256
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I48f6a9dfadefced20802bec1ab4ab843a9deba6e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9912
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* The information is extracted from the prototype argument
registry.
Partially resolves: COMPMID-6283
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ia6d69b7c2a2e411597e76a7e03b7c92199a16990
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9848
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-6283
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I7596e3dc357d6f0b9cbe66534523943a73c26d81
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9864
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6337
Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Partially port ElementwiseBinary component to ckw (broadcast not
supported yet)
* Port Store component to ckw
* Move KernelArgumentsHelpers to ckw_driver/ as it's only used by the
driver
ckw_driver is a middle layer between dynamic fusion and Compute Kernel
Writer (CKW). It consumes the fused kernel component stream produced by
Dynamic Fusion and uses CKW to write the kernel code complete with all
meta info needed by the runtime to enqueue the kernel.
It consists of two parts:
* Kernel writing: This resides in dynamic_fusion/sketch
* Runtime utilities: This resides in dynamic_fusion/runtime
The integration (separation between DF and CKW) occurs in two places:
* Inside GpuCKWDriver
global driver that coordinates how the
final fused kernel code is assembled together alongwith other meta
info needed by runtime.
* Inside each instantiated IGpuCKWComponentDriver
component driver that drives CKW to write component-specific code
or do component-specific configurations
Partially resolves: COMPMID-5792 COMPMID-6282 COMPMID-6260 COMPMID-6266
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ib57a080a65fe8cfee1a8df1529fe572005a6d2f2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9847
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add the public API for compute kernel writer.
* Use the prototype as the implementation of the public API.
Resolves: COMPMID-5790
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I9d80e15325e1d953feb87c1f2eb61a587bb9ab5e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9814
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Define ckw::TensorStorage. The tensor storage represents the type of tensor memory object.
* Add helper functions for setting the CKW TensorComponent and TensorStorage as OpenCL kernel arguments.
* Refactor CL Image2D method for simpler image object creation.
Resolves: COMPMID-5784
Change-Id: I2d37d06783c1dc55f3b5692b44eb49b151f2401c
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9807
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6318
Change-Id: I447cd46a7e86da5858d264fc10c72d194d728085
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9804
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Resolves COMPMID-6194
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ie45e2aa9533948b2e5235563cef1d3834494eccf
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9739
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6023
Change-Id: I868975d14c4f98af6716726feda22405a6a4c891
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9686
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add heuristic for f32/f16 and int8 quantized data types
- Include MatMul configuration selection in the CLMatMul operator
Resolves COMPMID-5950, COMPMID-5957, COMPMID-5959, COMPMID-5925,
COMPMID-5926, COMPMID-5927, COMPMID-5928
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Change-Id: Ic222148da0337b88d4d8c960e3b6ac31003d8bcb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9564
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves ONCPUML-1232
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Change-Id: I258d03524c50dd24975b473aede061f80bf9d91b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9534
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Adds Reorder kernel exposing blocking reorders from arm_gemm
Resolves ONCPUML-1232
Change-Id: I42bf4166311fe1771565134d3ed7039fc8e30230
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9500
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6024
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I210ca5577eeb0b2c91d959fc37dcc77b3847abb3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9517
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5988
Change-Id: I93e78edf31c9eec8242ccbb8c3c768f46a7c7c38
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9456
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|