aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/kernels/CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp
AgeCommit message (Collapse)Author
2019-05-03COMPMID-2108: Fuse Activation Layer in CLDepthwiseConvolutionLayer3x3Kernels ↵Manuel Bottini
for F32 Change-Id: I39dd23696b6d8573e172a59b9e327b6a69886f08 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/973 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Usama Arif <usama.arif@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
2019-04-18COMPMID-2047: Add support for dilation in CLDepthwiseConvolution.Usama Arif
Change-Id: I3106aa34bd168985a56791613d95072756be6e9b Signed-off-by: Usama Arif <usama.arif@arm.com> Reviewed-on: https://review.mlplatform.org/c/958 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-03-08COMPMID-1882: Improve memory coalescence when reshaping the weights for ↵Pablo Tello
CLDepthwiseConvolution Change-Id: I97788d9e349f37fcd818d588d668e2d5e22fd568 Signed-off-by: giuros01 <giuseppe.rossini@arm.com> Reviewed-on: https://review.mlplatform.org/c/818 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2019-02-27COMPMID-1995: Fix output quantization CLDeptwiseConv3x3 when activation is fusedIsabella Gottardi
When quantization is fused in the optimazed Deptwise Convolution 3x3, we quatized twice the output. Change-Id: I1cae1f646045a456d6e2754917ddef0cda0679f4 Signed-off-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-on: https://review.mlplatform.org/c/796 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2019-01-30COMPMID-1691: Optimize CLDepthwiseConvolutionKernel (QASYMM8/NHWC) for 3x3 ↵giuros01
kernels (stride=1 and stride=2) Change-Id: I7d0d2dc350feeb40d253d17f9ffd5051a8fb42ef Reviewed-on: https://review.mlplatform.org/511 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-01-25COMPMID-1866: Revisit padding and window on CLDepthwiseConvolutionNHWCMichele Di Giorgio
Change-Id: I1fd20ebb5f3295722c833a599ea82798ceff6e01 Reviewed-on: https://review.mlplatform.org/471 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-12-28COMPMID-1860: Invalid arguments in CLDepthwiseConvolution3x3 for NHWCGeorgios Pinitas
-Alters the kernel/function selection process to use validate for selection. -Fixes border kernel input in case of permutation. Change-Id: Ia61df3a0ed661349114dc125f33ad53ee40d9c76 Reviewed-on: https://review.mlplatform.org/443 Reviewed-by: Anthony Barbier <Anthony.barbier@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2018-11-08COMPMID-1776: Revert QuantizeDownStage to use fixed-pointGeorgios Pinitas
Change-Id: I807ef84dbf893bd401dcac5c0fa3a4ee49aabc66
2018-11-06COMPMID-1703: Collapse the 4th dimensions in ↵Georgios Pinitas
CLDepthWiseConvolutionLayer3x3Kernel Change-Id: Ie274da79b15c03f86dfedc85bb721b3de34a0bb4
2018-11-02COMPMID-1413 - Improve the performance of GEMMLowp with 8 bit dot product on ↵Gian Marco Iodice
OpenCL COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 % Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride but I have not seen any benefit (maybe because we have few arithemtic operation and we do not have more load instructions). However Depthwise convolution has been improved by 30% Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1451: Fuse activation in DepthwiseConvolution.Georgios Pinitas
Change-Id: Id964d9068e18aaa13ab8adcbf7a9375b034ea6c3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154651 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1539 Implement YOLOLayer on CLGiorgio Arena
Change-Id: I332c0703e1399fca0c5b724529b54a28f49c88da Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/146842 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1533 Implementing CLDepthWiseConvolutionLayer with FP16 (NHWC)Giorgio Arena
Change-Id: I46965aeb1fffba8cbf083cab7284c549b0e94d00 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145334 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1478: Stop relying on static default OpenCL objects in cl2.hppAnthony Barbier
This causes problems when ACL is used as a shared library on Android. Fixes some problems related to creation / destruction order between the Graph's CL backend and core / runtime Change-Id: I716d63fd42f4586df1ffbb6fa97e4db06d3a781b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/143228 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1316 Using 8 bit dot product instruction in CLDepthWiseConvolution ↵Giorgio Arena
with QASYMM8 Change-Id: I3fc37bdceaae8b4b1effa51129b71bf352388564 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/138374 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1349: Add support for QASYMM8 LOGISTIC activation in CLActivationLayerMichele Di Giorgio
Change-Id: Ibabce61cf5427de80078a6468023bed05f5e7c2c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/139006 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-970 : Remove QS8 / QS16 supportVidhya Sudhan Loganathan
Removed fixed point related code. Change-Id: I487acf138dace3b0450e0d72ca7071eaec254566 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/137678 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-811 Add NHWC data format support for CL depthwise convolutionGiorgio Arena
Change-Id: I574f7945f0be009c638d860028bce8b52b4120fd Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/136484 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1246 CLDepthwiseConvolution QASYMM8 NHWC kernel cleanupGiorgio Arena
Change-Id: If9385e6bcbf2242b973f42d6979b16ebc39f2cb4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/136159 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1149: (OCLGrind) Invalid read dwc_3x3Georgios Pinitas
Updates bias padding in NHWC path. Change-Id: Ie986eaa91ad358ec1a3fe1f7b493fb2e93a40809 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/131027 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-926 Add depth multiplier support to NEON/CL/GLES depthwise convolutionGiorgio Arena
Change-Id: I03f32c62350e5ea43e77bb15fc5a832d83719e3b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/126657 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-811 Add NHWC data format support for CL depthwise convolution QASYMM8Giorgio Arena
Change-Id: I89de432f3fbcba7abf9e1d4f8396a4334b4fa2c2 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118324 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>