Age | Commit message (Collapse) | Author |
|
Resolve COMPMID-4416
Change-Id: I83cdf0de7adaf4d465ffebd494ab913182072485
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5788
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute
- Remove specialized kernels for 3x3 NHWC
- Simplify CLDepthwiseConvolutionLayer.cpp to call just the native
implementation for both floating-point and quantized data types
- Develop two parametric opencl kernels for depthwise convolution layer NHWC
(floating-point and quantized)
- Add support to export the weights to cl_image
- Extend test for depthwise convolution on opencl
Resolves COMPMID-4417
Change-Id: I253dd5d959a70783c82e62b1771a5e9f91621cb0
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5806
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
|
|
Add FP16 support to CLRemap when data layout is NHWC.
Add relevant tests for FP16 and validation.
Update NERemap function level to be consistent with CLRemap.
Add depreciation notice for uint_8 only function level methods.
Resolves: COMPMID-4335
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Change-Id: If05f06801aef7a169b73ff1ebe760a42f11ca05c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5816
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Port CLWinogradInputTransformKernel
Port CLWinogradFilterTransformKernel
Port CLWinogradOutputTransformKernel
Resolves: COMPMID-4504
Change-Id: I3177dda0b9c2f56b36cb317027e94abe8d47229e
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5680
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Add NHWC support to CLRemap, also add relevant tests.
Partially resolves COMPMID-4335.
Change-Id: I119bea99be497fb85d5cd83a10f8d4e8e1f97f17
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5773
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves COMPMID-4544
This reverts commit 50929ef951880469b9d579323d2f9c9f5025327d.
Signed-off-by: Pablo Tello <pablo.tello@arm.com>
Change-Id: Ibc40054cecc2fe99bff48154f11fe5b602b4a64a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/330380
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5774
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Fix issue where gpu elementwise operation kernel would not compile when
input is 1xN size and prelu is the chosen operator.
Add relevant tests for 1xN input.
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Change-Id: If0651cfa399ca1d9c65f2632b75536c7931f27d4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5760
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-4430
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I9a40033e09223d601460a7e52cc297c58c9a2737
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5757
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Rename CpuPooling to CpuPool2d
Rename CpuPoolingKernel to CpuPool2dKernel
Rename CpuPoolingAssemblyWrapperKernel to CpuPool2dAssemblyWrapperKernel
Move CpuPool2dAssemblyWrapperKernel in internal subfolder
Rename CpuDepthwiseConvolutionNativeKernel to CpuDepthwiseConv2dNativeKernel
Rename CpuDepthwiseConvolutionAssemblyDispatch to CpuDepthwiseConv2dAssemblyDispatch
Rename CpuDepthwiseConvolution to CpuDepthwiseConv2d
Rename CpuDirectConvolutionKernel to CpuDirectConv2dKernel
Rename CpuDirectConvolutionOutputStageKernel to CpuDirectConv2dOutputStageKernel
Rename CpuDirectConvolution to CpuDirectConv2d
Rename ClPoolingKernel to ClPool2dKernel
Rename ClPooling to ClPool2d
Rename ClDirectConvolutionKernel to ClDirectConv2dKernel
Resolves: COMPMID-4405
Change-Id: I8e48f015e4e492a76a7512f5679cb3eb0cd028f6
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5708
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I12608eb4d681ea07d9d508226600a11365238a00
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5682
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Id98a107d512369d3799961011a84e9cc4d99e775
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5679
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Create right vector for size of 16 elements
Resolves: COMPMID-4532
Change-Id: I74dc89d158a6aa3f50db710cbc002a92156e2263
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5673
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Renames DepthConvert to Cast
- Ports both NEDepthConverLayer and CLDepthConvert variants
- Removes legacy shift capability from DepthConvert, allowing only
shifts of 0
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I806a0f8eb23d23502b632c529fda7edde19c8176
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5565
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Moves the following kernels:
- CLGEMMMatrixMultiplyKernel
- CLGEMMMatrixMultiplyNativeKernel
- CLGEMMMatrixMultipluReshapedKernel
- CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
Moves the following functions
- CLGEMM
Introduces facilities to easy handling of auxiliary temporary buffers
under then new run interface. Such are:
- CLAuxTensorHandler: That allows wrapping of workspace buffers memory
to CLBuffer objects
- Ability to inject TensorInfo to allocator without transferring
ownership. This reduce the copy overhead if needed.
Resolves: COMPMID-4188
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I7055435d831b05b749b26302082e4ac45f26dfb0
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5498
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I092d10534816f5b3717325952033c351b8231380
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5643
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Fix corner case in which the quantization is per channel and OFM == 1
The function can safely set the per_channel quantization flag to false since there is only one output channel
This way, the kernel will avoid adding useless padding to the output multipliers and shifts
Resolve COMPMID-4384
Change-Id: Ic03452bfaf52d1be536cd371721adedd2e580a08
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5648
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
- Bring the epsilon up to 1e-3 for FP16 (both backends) since it was causing the reference's variance being negative and its square root being NaN
- Bring the epsilon up to 1e-7 for FP16 NEON test for the same problem on the NEON kernel
- Adjust the CL kernel's vec_size when input tensor's width < 16 and use macros agnostic of vector size for sum reduction
- Add previously mismatching tensor shapes
Resolve COMPMID-4354
Change-Id: I823c871aacb72326f90c86b24cb16c3e2d4bd15e
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5630
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Resolves: COMPMID-4527
Change-Id: If038d2477d8898d3ee307fe58301fb0b16b64c02
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5640
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Only for NHWC
Instead of starting from the input vector and insert the values in the output,
we now create an output vector by taking from the input tensor. This makes it
easier to store the result in the output with border awareness
Resolves: COMPMID-4445
Change-Id: I77cf2d2d5ff30a383cddb8a3c81c462d7a39fd2e
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5627
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
- Change vectors' fixed length of 4 in gemmlowp_output_stage_quantize_down_float with its kernel's adjusted vec_size
Resolve COMPMID-4355
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I371ecf114356fb6a8d18c5c3727f09ae247484bd
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5631
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Fix issue where gpu select kernel would not compile when input was 1xN
size.
Also add relevant tests for 1xN input.
Resolves: COMPMID-4357
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Change-Id: Ib5f339379e9cc7ef05605312889dbad34cbfe5dd
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5620
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4339
Change-Id: Ieacd288ca8f69e04d88692a093325a24198ebe3e
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5623
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
CLCoreRuntime context is currently unused and is planned to be replaced
by the Context infrastructure
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Ic2874800960ca954f647e8867e7db951ce823e1c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5571
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- DOT_PRODUCT8_INTEGER8 and DOT_PRODUCT16_INTEGER8 are calling
DOT_PRODUCT4_INTEGER8 without passing DST_DATA_TYPE
Resolves COMPMID-4491
Change-Id: I394bd2f9208489e820885e49ed40e607d6470620
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5594
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Remove TODO/FIXME either already done or won't do.
Partially resolve: COMPMID-4471
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Iec8f25b9bf1b2a70c072dd17d44625fa93e84ed1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5591
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Moves all the kernels out of the legacy CLKernelLibrary into a separate
class that its planned responsibility is to only handle the kernel
strings.
Currently CLKernelLibrary mixes kernel inventory and building logic,
thus coupling multiple responsibilities.
By separating compilation logic, the kernel library removes any dependencies
from the OpenCL and simplifies maintainance but also increases flexibility.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Idcd491783b5a413bd944d385956075ead0204362
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5572
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves : COMPMID-3793
Signed-off-by: Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Id82e00c784f0a039017fd896f11671bdda2dd4ab
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5530
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Execution window along the X axis needs to be collapsed on the 3rd axis (rather than the 2nd) since there could be implicit padding added along the Y
Resolve COMPMID-4425
Change-Id: I9623a31749b737fea7c623cabdcfbf77cbe8f6dc
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5551
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
All data type and data layout information for the operators are store in the function header files
Signed-off-by: Teresa Charlin <teresa.charlinreyes@arm.com>
Change-Id: I30b564f7eda6bbd99bf3ad36ddb6639ac118eb8b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/319829
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5531
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Simplify the implementation when the pooling size has the same spatial
dimensions of the input tensor
- Rework the heuristic for F32/F16
- Add test for validating the global pooling path
- Fix compare_dimensions in validation. The validation fails because
we have different
number of dimensions for NCHW and NHWC (e.g. 1,1,2,1(NCHW) ->
2,1,1,1(NHWC)
Resolves COMPMID-4426
Change-Id: Ia53ee659a9fbc3d011f286a8150d1be9d6d2cd05
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5533
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
It doesn't take into consideration the padding.
Change-Id: Ie7a2a0c7aee93fd7007deb6e46f6bec0e7f06066
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5529
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change the parallel implementation across the X, now every thread computes one row
Add missing test for MEAN_SUM
Make reduction on any axis != 0 work with num_channels > 1
Resolve COMPMID-3917
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ib0f99540104e3c253bcd1ea637833db533f5e76e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5522
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I5c440f4c6ca4186adcfa926e6b7d924086671f29
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5520
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Correctly calculate the filter size when we
exclude padding.
Resolves: COMPMID-4408
Change-Id: Ia96d6ddbda58d23a5a0e02854ecb22089d538f94
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5523
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
Queues are responsible for scheduling operators and performing other
runtime related activities like for example tuning.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I0366d9048470d277b8cbf59fa42f95c0ae57c5c9
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5487
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Oclgrind flags this as a warning.
Resolves: COMPMID-4338
Change-Id: Id5a075894027d867bdd21937626f052acb05b531
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5512
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Simplify the implementation when the pooling size has the same spatial
dimensions of the input tensor
- Rework the heuristic for F32/F16
- Add test for validating the global pooling path
- Fix compare_dimensions in validation. The validation fails because we have different
number of dimensions for NCHW and NHWC (e.g. 1,1,2,1(NCHW) -> 2,1,1,1(NHWC)
Change-Id: Iba680cb30bf2a5d0952265a4cc9794f368549ca5
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5510
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-3915
Change-Id: I26c103507717f16588f19c07bf20df5657022dec
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5489
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves COMPMID-4400
Change-Id: I54c33a017c735194fbf4437d1c7df465208bc0ca
Signed-off-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5505
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
Fixing Conv5x5, Conv5x1, Conv1x5
Resolves: COMPMID-4380
Change-Id: I5206d9b85b1d73f6010f02c119aae91266395ba7
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5485
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4396
Change-Id: I9b16791f84d60bc4a5303a6393cdbe9db3a4f0e9
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5483
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
|
|
This patch enables CLVK through the graph API and inside the
CLScheduler. By default the Native platform is selected.
Selecting CLVK can be done via --target=clvk.
Resolves COMPMID-4205 and COMPMID-4206
Change-Id: Ic60744980c6b8a60e776627ea677ed46be88f656
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5475
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: Ibab2095a1a5b525c8513f924cd5bcecd5172ed48
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5467
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Remove the reshaped variant for CLDepthwiseConvolutionLayer 3x3 NHWC Quantized
- Remove kernel selection by GPUTarget
- Remove unused quantized support from the NHWC kernel
- Remove CLDepthwiseConvolutionLayerReshapeWeightsKernel
- Remove OpenCL kernels for reshaped dwc 3x3 quantized and weights reshape
- Remove the "_bifrost" suffix in common OpenCL kernel
- Remove the ICLDepthwiseConvolutionLayer3x3Kernel common interface
Resolve COMPMID-3864, COMPMID-3907
Change-Id: Icfac0fb6c00e214985beb05dad7c0cdbbee7d830
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5447
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Change kernel's vec_size to 16 / sizeof(output)
- Change ICLKernel.cpp to handle broadcast without padding
Resolve COMPMID-3913
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I03e884b250ef5784dc109bff8cf2c96b345d119f
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5450
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Computing the activation in FP32 and then converting in FP16
Resolves: COMPMID-4380
Change-Id: I8a857af65967c8017fb60a358b4f8f0d9fc2e1c2
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5457
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* This commit removes the tracing code which has not been maintained for a few releases.
* Resolves MLCE-445
Change-Id: I14793c82fe58ffef0cf936edf4af077b5dde85f8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5455
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves : COMPMID-3793
Signed-off-by: Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I14d6884c34f33a6caee11fc1230f9d2d3ae6c4c1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5425
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Make changes to split the workload into two kernels. One kernel precomputes
mean and variance and the second kernel just loads these precomputed values.
* The new approach runs %30 faster than the original code for NHWC workloads
like 32x192x256.
* Resolves MLCE-337
Change-Id: I8356fcefa2d131ab4dcb32268ce7142421d073e4
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5355
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Resolves: COMPMID-4185
Change-Id: Ib5f22356356a022d567bb18d44ea272b62d10ebf
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5424
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|