Age | Commit message (Collapse) | Author |
|
Skipped im2col in CLGEMMConvolutionLayer for 1x1 convolutions with NHWC data layout
Change-Id: I894e6b952ed8605e8f3ffc0ffc25c24730d4664c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/141909
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Makes GEMM3D account top padding when jumping among planes.
Change-Id: Ia7c16cfa5498de106774ce42cbc4716e9f43195b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/139612
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Removed fixed point related code.
Change-Id: I487acf138dace3b0450e0d72ca7071eaec254566
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/137678
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I64b09c692a1da44413a03a3abb4b4534d138dc3d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/136986
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I8c4823a0d909e19e9ef548f00b9ae98c66de61dd
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/123569
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Added
* Compile time switches for kernels using FP16 extensions
* Validation for support of atomics extension
Change-Id: Ia88e601db054ff35f1508988b5e322bd27511ac5
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/133216
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I717ec4d0e483966c5de0148206b9eaabe81b9179
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/132417
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I22fe80393ec70e4501a4f9f9cad14014029d035d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/129134
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
-Multiple definitions of COLS_MTX_B in gemm.cl one for FP32 and one for
FP16.
-GEMMTranspose1xWKernel invalid check fro small window sizes.
Change-Id: I9c7ddd3577aec9afc702731ca27a1e10d6eddb81
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/131023
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This patch improves of ~30 % GEMM fp16 when the reshape is required
The results have been reported at the following confluence page:
https://confluence.arm.com/display/MLENG/GEMM+FP16+performance%3A+ACL+18.05
Change-Id: I8233095a7e9ab06f1f915782a25dd41653b49140
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/128254
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Results reported at:
https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.05
Change-Id: I3246c4f19c4d21a7d6a44e4593bc5caffc016f81
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/127838
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This patch improves of ~20% GEMM fp16.
The results has been reported at the following confluence page:
https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.05
I am aware with few cases we have a bit of degradation. However this cases are
memory bound anyway (Fully connected layer cases)
Change-Id: I183cbb7fba55a0b5eb86532c4dca5efe096096b0
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/128044
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
The bug concerned the collapse of the window in CLGEMMMatrixMultiplyKernel
Change-Id: I5043bf37b72eeb615ebe7fb3f2c8e72d006bf341
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/126262
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Implemented Winograd Output Transform (2x2,3x3) on OpenCL
Implemented CLWinogradConvolutionLayer on OpenCL
Change-Id: I6a113fc5f052ca07f878d2b800d2ab003f84af65
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/125148
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
This patch enables GEMM to execute multiple batches in parallel
https://confluence.arm.com/display/MLENG/Winograd%3A+batched+GEMM
Change-Id: I66222db041dd35e82af11fbb262fd1ebd3ca4b2f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/120866
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This new optimization allows to achieve 36.3 % of MAC utilisation on Mate 9 @ 1GHz.
The performance have been reported here
https://confluence.arm.com/display/MLENG/GEMMLowp+performance%3A+ACL+18.02
Change-Id: I71b6a217068763dfdc11bbf3574ee0eb94f93679
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118531
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
This patch introduces a new GEMM capable to improve the mac utilisation
of 10% compared to the GEMM without reshape. However this implementation
is not faster in all cases as we need to take into account the time for
reshaping the matrices. For this reason an heuristic solution to select
the optimal GEMM to use has been added to the function. More information
about the heuristic implementation can be found at COMPMID-852.
With this new patch, GoogleNet, MobileNet, VGG16 and SqueezeNet can
improved the performance of 1.5x.
More information about the performance uplift can be found here:
https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.02
Change-Id: I024563c06b9aed02a211a974e452bae5c233b04c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117140
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Reworked the interface of GemmLowp in order to make easy the integration
in Android NN
- Added support for different output stage
- Added validation for both matrix multiplication and output stage
- Added bounded relu support in the output stage
- Added in32_t bias support
- Added optimized path for vector by matrix case
This rework is required for:
- Convolution quantized
- Fully connected quantized
Change-Id: I512283d406099cf8c614dd89d0a97ed411143afc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110625
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
|
|
Change-Id: Ie56ac88dff5ff339572cec562e8cd62dc7f0aa8b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/109805
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I0f5396c8f32acc28914e2ff9fe953f977a3077b9
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/93405
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
|
|
Change-Id: Idd830cff054114123229c189e423b753b8064146
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/92623
Reviewed-by: Robert Hughes <robert.hughes@arm.com>
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I4ef18f49f1da0cb816aaa0762466b940792c15ed
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84162
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
3RDPARTY_UPDATE
Change-Id: Iee572e18d5b1df71300d738cc8690f49d7203d5c
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81353
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
to use 8 bit fixed point
Change-Id: I1cb1b4d7711ad7b569ee691e13a5df1b3430292b
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79565
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
due to the non non representable numbers in half and float
Change-Id: I1590699e788a2cd3ed6e72de4afff75e6346f6d8
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79800
Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
|
|
Change-Id: I30aef3c7ecd1ee740c2a7f2ce65a63c7dcd66e49
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79630
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
|
|
Change-Id: I1353fd652ee180e3931e58b4ce13d651a48c7e2c
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79567
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
|
|
Change-Id: I6c8bd69ae9715e4d83d128b2162fc15aa5561afb
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78804
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
|
|
CLGEMMMatrixAccumulateBiasesKernel
Change-Id: Idba13b578dc564b8003ce2fa3392eea2af3ce806
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78664
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
to support 8 bit fixed point
Change-Id: If236c9047ed536e808a0ed26e97e1799ca938e03
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78529
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I32f7b84daa560e460b77216add529c8fa8b327ae
|