Age | Commit message (Collapse) | Author |
|
Add QSYMM8_PER_CHANNEL support on weight input for CLDeconvolutionLayer.
When weights are per-channel quantized type "Direct" method is always
used.
Also reduce number of QSYMM8_PER_CHANNEL tests for NEDeconvolutionLayer.
Resolves: COMPMID-3438
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Change-Id: I1330cac5142e19d21e322574fb8d912558745b02
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5484
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Oclgrind flags this as a warning.
Resolves: COMPMID-4338
Change-Id: Id5a075894027d867bdd21937626f052acb05b531
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5512
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Simplify the implementation when the pooling size has the same spatial
dimensions of the input tensor
- Rework the heuristic for F32/F16
- Add test for validating the global pooling path
- Fix compare_dimensions in validation. The validation fails because we have different
number of dimensions for NCHW and NHWC (e.g. 1,1,2,1(NCHW) -> 2,1,1,1(NHWC)
Change-Id: Iba680cb30bf2a5d0952265a4cc9794f368549ca5
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5510
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-3915
Change-Id: I26c103507717f16588f19c07bf20df5657022dec
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5489
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves COMPMID-4400
Change-Id: I54c33a017c735194fbf4437d1c7df465208bc0ca
Signed-off-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5505
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
Resolves: COMPMID-4395
Change-Id: Ib3dfdc42e95998c1e5713d6ec1bdaa83299b0360
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5488
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Fixing Conv5x5, Conv5x1, Conv1x5
Resolves: COMPMID-4380
Change-Id: I5206d9b85b1d73f6010f02c119aae91266395ba7
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5485
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This caused build failures on bare metal.
Resolves: COMPMID-4399
Change-Id: I151012740a440e8939b76b978aa9e96741229245
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5482
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4396
Change-Id: I9b16791f84d60bc4a5303a6393cdbe9db3a4f0e9
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5483
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
|
|
This patch enables CLVK through the graph API and inside the
CLScheduler. By default the Native platform is selected.
Selecting CLVK can be done via --target=clvk.
Resolves COMPMID-4205 and COMPMID-4206
Change-Id: Ic60744980c6b8a60e776627ea677ed46be88f656
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5475
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: Ibab2095a1a5b525c8513f924cd5bcecd5172ed48
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5467
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Check if biases are null before passing them to
CpuDepthwiseNativeKernel.
Resolves: COMPMID-4389
Change-Id: I29acbdefb75b9e81c293801a265e9b850d8d00b9
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5464
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch brings performance uplift on Cortex-A35.
Resolves: COMPMID-4316
Change-Id: I2b9c02a599373f780dd1b981b821e33bd59a3422
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5461
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Remove the reshaped variant for CLDepthwiseConvolutionLayer 3x3 NHWC Quantized
- Remove kernel selection by GPUTarget
- Remove unused quantized support from the NHWC kernel
- Remove CLDepthwiseConvolutionLayerReshapeWeightsKernel
- Remove OpenCL kernels for reshaped dwc 3x3 quantized and weights reshape
- Remove the "_bifrost" suffix in common OpenCL kernel
- Remove the ICLDepthwiseConvolutionLayer3x3Kernel common interface
Resolve COMPMID-3864, COMPMID-3907
Change-Id: Icfac0fb6c00e214985beb05dad7c0cdbbee7d830
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5447
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Remove includes of NEConvertFullyConnectedWeightsKernel.h
Resolves partially: COMPMID-4187
Signed-off-by: Teresa Charlin <teresa.charlinreyes@arm.com>
Change-Id: I1bf246546d3ef53edb4c5a8bc05a0db92d2d3bff
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5418
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Change kernel's vec_size to 16 / sizeof(output)
- Change ICLKernel.cpp to handle broadcast without padding
Resolve COMPMID-3913
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I03e884b250ef5784dc109bff8cf2c96b345d119f
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5450
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Computing the activation in FP32 and then converting in FP16
Resolves: COMPMID-4380
Change-Id: I8a857af65967c8017fb60a358b4f8f0d9fc2e1c2
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5457
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* This commit removes the tracing code which has not been maintained for a few releases.
* Resolves MLCE-445
Change-Id: I14793c82fe58ffef0cf936edf4af077b5dde85f8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5455
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves : COMPMID-3793
Signed-off-by: Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I14d6884c34f33a6caee11fc1230f9d2d3ae6c4c1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5425
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Make changes to split the workload into two kernels. One kernel precomputes
mean and variance and the second kernel just loads these precomputed values.
* The new approach runs %30 faster than the original code for NHWC workloads
like 32x192x256.
* Resolves MLCE-337
Change-Id: I8356fcefa2d131ab4dcb32268ce7142421d073e4
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5355
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Resolves: COMPMID-4185
Change-Id: Ib5f22356356a022d567bb18d44ea272b62d10ebf
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5424
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve: COMPMID-3911
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Id5615b6a8b52030fb611a1a04bcd4664b8232e90
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5451
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
A couple of utility functions to get the information
about tensors are added. Those functions are placed
at an additional header file for better grouping.
Related test cases are also added.
Resolves: COMPMID-4376
Change-Id: I6bd09cbf60fddcf4fe651906982397afb0451392
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5405
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Fix the pooling kernel which has been missing consideration
of left padding, which can be implictly added by external
kernels.
Additionally, tests for FP16 have been added for the logic.
Resolves: COMPMID-4363
Change-Id: I5655991cb80f749fb1ae9bbd3918b436a078f5d1
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5421
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Include paddings in address computation for input and output
Resolves: COMPMID-4362
Change-Id: I1b34cf47e3b80b98d55fc8fbdeecbfd850d33197
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5439
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
In these cases, no padding is introduced and the use of AccessWindows is
not necessary and makes the code more confusing.
Change-Id: Id712cba35bb0440eb40c69fdc7ad0084dc9a5ab3
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5440
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Replace std::map with a basic container with std::array
Change-Id: I76f53ca61676ca0e5136ce61a3f3adb10e22b4c3
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5441
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I9f8d0c6e17d58700cc01fc5134cd2dffd26bc742
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5430
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Give the ability to the user to specify an allocator that can be used by
all the internal function tensors. This being a global needs to outlive
all the tensors/functions that are using it.
Resolves: COMPMID-4212, COMPMID-4213
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I251871c242879976819ebca1452404133a8e62d7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5420
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-4009
Change-Id: I19ffb61c5c4541134a5028677d2d81228740e454
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5419
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Only for NHWC data layout
Resolves: COMPMID-3910
Change-Id: Ie2d71482b3e3b55ac155e9af152032a5de8bbd50
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5388
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Replace ICLKernel by IClKernel in other unrelated kernels
Resolves partially: COMPMID-4187
Signed-off-by: Teresa Charlin <teresa.charlinreyes@arm.com>
Change-Id: I173b8f2ac645dbfd7d412f4b058c5c9655c229ee
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5402
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- The array initializer for the TILE object cannot always be utilized and so we
do require to manually initialize the TILE with the LOOP_UNROLLING macro
- Resolves COMPMID-4371
Change-Id: I2598354b9fae84c5e3bd11219fffdcdc297215e1
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5417
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
As underlying enum type can be an int we check also the lower bound of
the data type when creating a tensor to avoid creation with negative
values.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I00fa3cae988c5f20a56115b1c1b85b70e699c966
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5413
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve: COMPMID-4370
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: I4b2a8bf252405fe9006784fa1769ad5b6e708a71
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5414
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4367
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I5e65b62c2ca52cf65950c9c343864ef55b7122c3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5407
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- The cl_image object can be used for the weights
- cl_image can only work for f32/f16
- Fix the implicit padding on the first dimension X
Resolves COMPMID-4341
Change-Id: I04e0901c69e7765c42afceca38c4a840645b9123
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5393
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
The issue is related with clang version, clang 3.9 has the problem, clange 4.0 works. The workaround is to add an extra {} to make this work.
Partial resolves: COMPMID-4348
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ia079cbb3c44d617b1b42cb2af758b5a8ba1a032e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5399
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- We were validating the output data type, shape and etc when the output was not initialized yet
Change-Id: I71a3cda2aa2de500f5690ae8a1cfd05ece0c3858
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5398
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-4342
Change-Id: I468c6d68c0284e4ec76f22037a697fff7bc5638c
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5391
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4140
Change-Id: I17db0ee596665598d08d4359a373160f21ab9acd
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5390
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
The issue is related with clang version, clang 3.9 has the problem, clange 4.0 works. The workaround is to add an extra {} to make this work.
Resolves: COMPMID-4348
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: I2d8fc6400f32af5406fbf2d2556127a53b2ce918
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5392
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch takes advantage of tile_helpers.h and different
data layout input and tmp matrices.
Resolves: COMPMID-4142
Signed-off-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com>
Change-Id: I5d10bd3f08137414ee7520eef1e6d0aef8cbf160
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5382
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Semantic fix that otherwise led to compilation errors when building for
SVE and when MMLA instruction was enabled for int8.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I4852d806789d52c4ed1d3b9132b2f20c2f9b41fa
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5384
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The OpenCL API Specification states:
> The behavior of OpenCL API functions called from global constructors
> or destructors is therefore implementation-defined.
This patch improves compatibility with OpenCL runtimes that use static
objects to hold their internal state.
Change-Id: I850be378e9c6f0b5aa8db926fe0c62833a936724
Signed-off-by: Marco Antognini <marco.antognini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5383
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Sheri Zhang <sheri.zhang@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Indirect hybrid kernels read the full width of the bias. So we need to detect the case where we are writing a partial block and pad the bias for that block.
Resolves: COMPMID-4321
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ib8d8637724e34d1eae6cc22223df8d81a6d0ded6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5380
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Rework Winograd Input Transform 3x3 NHWC using the new macros
- Rework Winograd Input Transform 5x5 NHWC using the new macros
- Rework Winograd Input Transform 7x7 NHWC using the new macros
- The new implementation is also faster than before
- Winograd Input Transform 5x5/7x7 3x faster
Resolves COMPMID-4139
Change-Id: Ia9c8af23a2d47d2db60ec4c44650a63a34ffa0d5
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5358
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Resolves partially: COMPMID-4359 (1/2)
Signed-off-by: Teresa Charlin <teresa.charlinreyes@arm.com>
Change-Id: Id1859f3cd530eb05f027226e2004cf518778147e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5377
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves partially: COMPMID-4359 (2/2)
Signed-off-by: Teresa Charlin <teresa.charlinreyes@arm.com>
Change-Id: Id65ef04268575cc9d74be6114e82e116b8ed106d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5378
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This new scheduler mode is implemented to reduce runtime overhead on
high thread counts by distributing the scheduling work to all threads.
The fanout mode should only be enabled on high thread counts
(e.g. > 8 threads).
Alternatively the mode can be forced by setting the environment variable
ARM_COMPUTE_CPP_SCHEDULER_MODE to be either "linear" (default) or
"fanout". Note that on bare-metal this functionality is turned off but
it does not matter as only multi-threading is not supported on
bare-metal.
Resolves COMPMID-4349
Signed-off-by: SiCongLi <sicong.li@arm.com>
Change-Id: I46e2fab83ea24e616c82ae94dca7b2e72a73c7b8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5352
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|