Age | Commit message (Collapse) | Author |
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the templates vector_matrix_multiply_f16() and
matrix_matrix_multiply_f16 which had to be moved from impl.cpp to fp16.cpp
* Partially resolves MLCE-1102
Change-Id: Ic87440797d6f1653c815ab6565972206f5afd0ad
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10345
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6558
Change-Id: I015d504aaa9b8a1a232b01e49ab373d415ea1de9
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10340
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-6212]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I29bbd9a3d96af462faf7f0ee13b9849f75e05356
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10319
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template compute_all_anchors() that
had to be moved from impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: Iaff6da32d0b9789ef87ba3f95bef99343612bd01
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10309
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template fused_batch_normalization_dwc_nhwc() that
had to be moved from impl.cpp to impl.h
* Removed impl.cpp
* Partially resolves MLCE-1102
Change-Id: Idaaa113c71729e32e565acf5fb5694c76c36d76d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10308
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch fixes some include dependencies in certain files that caused build failures in https://review.mlplatform.org/c/ml/ComputeLibrary/+/10287.
It also circumvents some clang-format glitches.
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8e9d3307edd2d1afd17c685c9bc9429624130e5a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10313
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use various templates that had to be moved from
impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: I2e5e68fbcf5279de1ffc1be4def4f96ed05593e9
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10224
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* Partially resolves MLCE-1102
Change-Id: If53ff1927948b3ad7c9e3c9347bc2af38764e342
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10243
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template in_bounds_crop_window so it had to be moved from
impl.cpp to impl.h
* Removed the file src/cpu/kernels/crop/generic/neon/impl.cpp
* Partially resolves MLCE-1102
Change-Id: I1953849153e672ff7938f54c877c7498117dcca4
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10282
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* Partially resolves MLCE-1102
Change-Id: I7e6d998e427982d4a037dbce6d17ca378665e07f
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10241
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* Partially resolves MLCE-1102
Change-Id: I04822b043d9f87bc666750a8d95a8be8a6cc194d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10239
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* Partially resolves MLCE-1102
Change-Id: I5ecfc8f6c0d84f92d80bec2cde6e7338794b9788
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10240
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
PostOps was the experimental interface for Dynamic Fusion. It is now
replaced by the new Dynamic Fusion interface with code generation using
the Compute Kernel Writer.
Resolves: COMPMID-6190
Change-Id: I813b48facef2fd6f3aee332588886b4f9b3d33d8
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10219
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template run_depthwise_float() so it had to be moved from
impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: I428a79c4ab3a990331f20f5bd6b9fea88b0836b9
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10218
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use various templates that had to be moved from
impl.cpp to impl.h
* Removed src/cpu/kernels/pool3d/neon/impl.cpp
* Partially resolves MLCE-1102
Change-Id: I71e6a54a27fd8f04ae2a67231709aad723b09fa3
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10220
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template roi_align() so it had to be moved from
impl.cpp to impl.h
* Removed the file src/cpu/kernels/roialign/generic/neon/impl.cpp
* Partially resolves MLCE-1102
Change-Id: If78371479042725723cea6f6c65aac76d68a1c1d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10213
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Enable fp16 in armv8a multi_isa builds
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template add_same_neon() so it had to be moved from
impl.cpp to impl.h
* Partially resolves MLCE-1102
Change-Id: Ia51007f5e663b708071958bb94bfab4535e4b2f8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10191
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
to be moved to an fp16.cpp file to allow compilation with
-march=armv8.2-a+fp16
* fp16.cpp needs to use the template add_same_neon() so it had to be moved from
impl.cpp to impl.h
Change-Id: I9e64a3101958fcb9c3d5c8e9b148b498b2bee05f
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10154
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Following CpuReshapeKernel Optimizations, update the CpuGemmConv2D and CpuFlatten
to use CpuReshape operator instead of CpuReshapeKernel
- Minor changes to comment in NEReorgLayerKernel.h
Resolves COMPMID-6504
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Change-Id: Ib6ee1fdc313d91249f9fe41c81e73324031c1ff4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10186
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5279
Change-Id: Id9b007eed62c200702bbfcc83b94dab7b5de1714
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9962
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Take dilation into account when checking padding.
Resolves: COMPMID-6348
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I897a13ba7f37382733c35c1701d1ec310ed55331
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10147
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6495
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I916829222a6211fa096a833a2afc5fab5eb34ea4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10143
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* NEArgMinMaxLayer uses NEReductionOperation to compute its result in S32
* We need to call NECast to convert from S32 to S64
* Resolves MLCE-1089
Change-Id: I6fded869b6076d7af1b9b3e70eb384f4ee82fd8a
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10054
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a new section in the documentation to describe how the conv2D
heuristic works on Arm® Cortex®-based CPUs and Arm® Mali™-based GPUs
- Add CKW_UNUSED in compute_kernel_writer/src/cl/CLTile.cpp to avoid
the compilation error due to an unused variable
- Remove FFT from the list of algorithms to be selected by the CPU Conv2d
heuristic.
Resolves COMPMID-6163
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Change-Id: I51384d7749451b2562642683e8b2429a355166bb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10065
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Some symbols have been moved from core/Types.h. This patch retains
back compatibility so that the user can still include this header
for those symbols
* A new header core/CoreTypes.h is created to avoid circular dependency.
This header includes essential small types that are used across
functions
* Move all function info types into function_info folder for easier
tracking
Resolves COMPMID-6330
Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* The kernel now supports the following conversions:
S64 -> F32
U64 -> F32
* Resolves MLCE-1089
Change-Id: I277cf58b78d919fde25947520d2056e1412c7f82
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9935
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Makes a small difference to compile times and opens up other opportunities
to simplify code.
Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Speeds up compilation by 30% for some files when logging is disabled.
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Change-Id: Ia479bd50a80616a34e33ead13db8558f8dbaa1aa
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/534480
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9880
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6337
Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This significantly improves the compilation times for parts of the core library that just need
a definition of float16_t rather than access to all of the fp16 intrinsics.
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Change-Id: I5da1c6b0df0dd87d1d17948cd2e9b7375874f455
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/529385
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9781
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Renato Arantes <renato.arantes@arm.com>
Change-Id: I98de659d1289c930e366727d4799f0dacc8121ab
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9782
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Avoid the assembly kernels to be used when the padding is greater
than the kernel shape.
Resolves: COMPMID-6280
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ibe0820018c97f4481bf318397b797ec7b351a1d5
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9802
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Added fused activation to MatMul function interface
- Added fused activation to CL backend
- Includes tests for supported Activation Functions in MatMul
Resolves: [COMPMID-6192]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ie103212b600b60699eaf6a6394d609e6e1f5aba6
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/522465
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9714
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Move some maths-related things from Utils.h to new Math.h header in utils/math.
Move some routines used for Tensor shape validation to Validate.h
Change-Id: I8ce89fe03ec3ae1b61d1a80c282b8b91eea0cfb3
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524783
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9743
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Split some of the larger types with inlined code into their own
header files, so that the implementation of them needn't be included
everywhere.
Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Moving the code out of Types.h will help
with the compilation time.
* Added LUT support for all other activation functions.
* Resolves COMPMID-6292
Change-Id: I1b5f0b21f03237447163276b8796b2aeb3fdd45c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9749
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Validate returns an error if the number of channels of the input tensor is
not 1. With this change we generate an error if scale is called with any of
these formats:
Format::UV88, Format::RGB888, Format::RGBA8888,Format::YUV444,
Format::YUYV422, Format::NV12, Format::NV21,Format::IYUV,
Format::UYVY422
* Resolves ARMCL-631
Change-Id: If9d8b9d95332994920def55d8faae9dbf4213f79
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9579
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This specific Lut kernel uses sve2 instructions
Resolves: COMPMID-6268
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I44fa3812e96fa79b3d1e1e3a31d587581f59f0e1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9675
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Call Neon™ depthwise convolution validation inside in its configure() method.
Resolves: COMPMID-6188
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib2ae4d995ff2bbc92ce4496d4ab93cf09113e3e9
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9594
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
int_8 failures
* Adapt the CLMatMul function and ClMatMul operator to use quantized kernels.
* Add function-level tests.
Resolves: COMPMID-5929 and COMPMID-5811
Change-Id: I5348cdcf07b8074c138e04dfef0a73399377accd
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9575
Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6185
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Icfd9d177083ecdf41dc13e5b2ae982ff67492f8a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9577
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Following the investigation proposed by ONCPUML-1193, padding
is implemented in im2col when the input channel is not a multiple of
blocks requested by the weight format.
Partially resolves: ONCPUML-1193
Signed-off-by: Renato Arantes <renato.arantes@arm.com>
Change-Id: I350c7a1b2dcae63f8d94f5b6f1f86e948eab1f09
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9508
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6155
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ie651be65404b0b737464d7a79ebcc58475863ba0
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9555
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* There is an issue with quantized fully connected and matmul
when the lower bound of bounded ReLU is negative.
* Use int32_t for the calculation of min/max quantized value
rather than PixelValue to avoid this issue.
Partially resolves: COMPMID-5996
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I7b22e9d56a2441fc6a4c5c4e627f57d6e00d6ff1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9502
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is required for the case where rhs (B) is dynamic and needs to be
pretransposed in every run.
In a multi-threaded setting, this means the previously single-threaded
pretranspose_B_array would become the bottleneck
Resolves COMPMID-5896
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Id508c46992188a0f76a505152931d4955d04c16d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The existing 4x4 tiling for 32-bit transpose is not efficient on aarch64, given that there are a lot more Neon registers available. So making the tile size to 8x8 will greatly improve NETranspose latency.
For example, on AWS Graviton3 processors, with this change I have observed transposing a 768x768 matrix improves latency from 0.32ms down to 0.19ms. Improvement can also be seen across different matrix sizes.
Further enlarging the tile size to 8x16 or 16x16 won't make it perform as good as 8x8 due to register pressure.
This change is to mitigate the issue reported at:
https://github.com/ARM-software/ComputeLibrary/issues/1045
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: Ia09859cdf2f6d312e67219a9d95a3a3bf1db1999
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9448
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Resolves: COMPMID-5899
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I89d96e292c3492ba9b1900a3e5683f9dcd11dfc6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9440
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Case: when the dequantized float value < 0.f the unary op was
not called if operator is not LOG or RSQRT
Resolves: COMPMID-5994
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I24d69db22042701f506188ace91ea4ab3dafeccf
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9437
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Resolves: COMPMID-5995
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I707b8918bebee7e70d4de5207ef555c806e7a305
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9405
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implements MatMul function and operator for floating point datatype FP16/FP32
- Includes support for transposing dynamic tensors prior to matrix multiplication.
- Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors)
- Updates fixture to allow for testing fused activation in MatMul
- Adds tests for matmul with and without fused activation
Resolved: [COMPMID-5898]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|