Age | Commit message (Collapse) | Author |
|
This patch improves of ~30 % GEMM fp16 when the reshape is required
The results have been reported at the following confluence page:
https://confluence.arm.com/display/MLENG/GEMM+FP16+performance%3A+ACL+18.05
Change-Id: I8233095a7e9ab06f1f915782a25dd41653b49140
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/128254
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I4838f5a8e4c33ed646cd05e0bb682fca635a29a3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/130469
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|
|
Change-Id: I56d2a02b316f0c69ff1fd7220e732f775414fe69
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/129709
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I03f32c62350e5ea43e77bb15fc5a832d83719e3b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/126657
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I3d91fde78b971aba3f6349f633cd9b1c50e5cacf
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/124712
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
COMPMID-1103 - CLWinogradConvolutionLayer mismatches
Change-Id: Iceaa9482a1790ec39d2720c220261aaea8043978
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/129398
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: Ie37588f60b9cfc7b1d09b1e8628fcfb4b17e0717
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/123834
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Id540490e5faf11c466ff039a20880eeedd6e5ec7
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/128612
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
|
|
The performance achieved can be found at the following confluence page:
https://confluence.arm.com/display/MLENG/GEMM-based+convolution+vs+Winograd-based+convolution+on+OpenCL
Change-Id: I4b690cfdd4eb4ff0cd17b14fdd49ccaa1d1dc85c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/127729
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I0b126f03028f08687497b0d79d2e2764f7ed07c8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/128001
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Currently we have beta and gamma compulsory in Batch normalization. There are
network that might not need one or both of those. Thus these should be optional
with beta(offset) defaulting to zero and gamma(scale) to 1. Will also reduce
some memory requirements.
Change-Id: I15bf1ec14b814be2acebf1be1a4fba9c4fbd3190
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/123237
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Results reported at:
https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.05
Change-Id: I3246c4f19c4d21a7d6a44e4593bc5caffc016f81
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/127838
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This patch improves of ~20% GEMM fp16.
The results has been reported at the following confluence page:
https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.05
I am aware with few cases we have a bit of degradation. However this cases are
memory bound anyway (Fully connected layer cases)
Change-Id: I183cbb7fba55a0b5eb86532c4dca5efe096096b0
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/128044
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I89de432f3fbcba7abf9e1d4f8396a4334b4fa2c2
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118324
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: Iac26936f46d0f7cdd9d2f8393b0092cd5a223c45
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/127675
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I6dd639bf5df9bc0c133996f75bdee767f70a6cfb
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/127469
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I67fe3fcea08704d9f4b04d22fe34db83b2697b87
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110562
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ia36bd11ca27420d3059eea15df81b237900149ec
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/125175
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: John Richardson <john.richardson@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Introduced static and dynamic checks before using printf vendor extension features (callbacks and buffers)
Change-Id: Ib38cb3d8591bbb482d02a41918f4b52efde75267
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/126751
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I2af6544eab17004c5b3de56557cb2cc5efecc915
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/122181
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|
|
Change-Id: I73231fc71c5166268e6c909b7930b7e034f3794e
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118876
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: If4626ec9e215e14dffe22e80812da5bac84a52e2
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/125734
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
The bug concerned the collapse of the window in CLGEMMMatrixMultiplyKernel
Change-Id: I5043bf37b72eeb615ebe7fb3f2c8e72d006bf341
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/126262
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ica17528bf6c812d9caf9d66c612c11434ec1dc69
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/125542
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: Ie73d8771f85d1f5b059f3a56f1bbd73c98e94a38
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/124723
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I68c6453e0f192de659582404f109a89616b9fbb9
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/124811
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Implemented Winograd Output Transform (2x2,3x3) on OpenCL
Implemented CLWinogradConvolutionLayer on OpenCL
Change-Id: I6a113fc5f052ca07f878d2b800d2ab003f84af65
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/125148
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I287908f76af458ad4b4d865d353dc37e33877250
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120839
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Implemented Winograd Filter Transform 3x3 on OpenCL
Change-Id: I8f2b2dd938c5c000ef7ce392a37fb7b8b4202a4e
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/122708
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I51f92f30602fb0a02314f344fa67061f448694bf
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/122793
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
This patch enables GEMM to execute multiple batches in parallel
https://confluence.arm.com/display/MLENG/Winograd%3A+batched+GEMM
Change-Id: I66222db041dd35e82af11fbb262fd1ebd3ca4b2f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120866
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ica047a92d3ab199ffc65a512b9ba10e865639dfe
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/121806
Reviewed-by: Les Bell <les.bell@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ie5f299c7a7fbe3062cee22bb2b4ae5df818fe490
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/121178
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I5022d02f06f9d849dad76e3d9b8e48632c236429
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/121191
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I91f6a0b057f5eb84c6ac7db5abbc05c7520ed5d2
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120760
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I4404f91a270e0ba7bbb7451c4c43a485fd4a3f6c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/121105
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Invalid conversions in oclgrind when clamp is used.
Removed call to clamp in CL kernel and replace with convert_sat.
Change-Id: I3cd9b87dc10c65d307fbf6eb0aec1b671fba6e97
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/121062
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This new optimization allows to achieve 36.3 % of MAC utilisation on Mate 9 @ 1GHz.
The performance have been reported here
https://confluence.arm.com/display/MLENG/GEMMLowp+performance%3A+ACL+18.02
Change-Id: I71b6a217068763dfdc11bbf3574ee0eb94f93679
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118531
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I3512d67b8a72b17db1381842ca42780e39cc511c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120605
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: Ifb4d27ba05aa618babb79b1f8e95fbfa689c5f3a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120792
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ic6097e7cf160e8b829fb521b7b99d9a57d9799d3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118774
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ie0c5885a60771f728f80a8c4bdb7f1e4085fa3ee
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120267
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
sizes - Part 2 (CL)
Change-Id: I004906b9b1f11158fe17b4aa2640a7f4685fb929
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118462
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I9a607fe620f795cdea1a99fdd3f5f8c2fc76f980
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119234
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
-Adds quantization info to the ActivationLayer benchmark fixture
-Replaces clamp with convert_sat in depthwise conv kernel
-Fixes ROIPooling execution slice
Change-Id: Ie9bbe08abcfb8278456964e476b0948247c7ecba
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118957
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|
|
Change-Id: Ic26fed30f9a54e6adef7861c05c9d55d23ca52ca
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119913
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ifa74e2bf05546de9a49aa185e22fba50438d8ad6
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113946
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
This patch makes col2im on OpenCL 2 times faster
Change-Id: I8d90f5a72a050355ca1fd13433d8c2c26e5e33f5
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119442
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
This patch brings the MACs utilisation up to 25 % when both stride_x and stride_y are equal to 1
Performance reported in the following confluence page:
https://confluence.arm.com/display/MLENG/Depthwise+convolution+3x3+FP32+performance%3A+ACL+18.02
Change-Id: Ida1b64be9a88805902a3d90194559b58eb1224a3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119068
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Introduced optimizations for 1x1, 3x3, 5x5 and 11x11
Change-Id: Ibb7f7a9fbec01a7684746ed8513634078126e452
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118107
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|