aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels/gemmlowp.cl
AgeCommit message (Collapse)Author
2019-07-18COMPMID-2096: Refactor the CLGEMMLowp function selection (heuristic)Gian Marco Iodice
Change-Id: I15a8b39e0354d3b6686ed4cc8c361782c0512037 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1410 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: VidhyaSudhan Loganathan <vidhyasudhan.loganathan@arm.com>
2019-07-12COMPMID-2468: (Nightly) Bug in CL QSYMM16Michalis Spyrou
Change-Id: I08001e878520485d7281e5fcc60ea81686992961 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/1534 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2019-07-11COMPMID-2410: Create a new GEMMLowpQuantizeDownInt32ToInt16ScaleKernel for CLManuel Bottini
Change-Id: Iab74b72f7adf712a1baf16aab916ea7c8d2bf92f Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/1497 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-06-17COMPMID-2401: Fix CLGemmLowp macro expansion on no-dot platformsGeorgios Pinitas
Change-Id: If707865ff13c96627816863cd05e09aaef247bbe Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-on: https://review.mlplatform.org/c/1361 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-06-10COMPMID-2094: Implement CLGEMMLowpNativeGian Marco Iodice
Change-Id: I2a80eec28baf9e83bfc67a930e2a140642e0b09e Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1285 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-06-03COMPMID-2379: Use the macros available in gemm_helpers.h in GEMMLowp OpenCL ↵Gian Marco Iodice
kernels Change-Id: I09923a068bff36d42a3f2c1084ffa8bf218187b9 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1260 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-05-30COMPMID-2373: Remove unused gemmlowp opencl kernelsGian Marco Iodice
Change-Id: Ie1fe6e80957007b41f6db860f073764e37d91b9f Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1252 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-05-20COMPMID-2338: Remove CLGEMMInterleave4x4 and CLGEMMTranspose1xWGian Marco Iodice
Change-Id: I527fc97eac51308de601e5d1d50e75e4d89c5ee5 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1158 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-04-16COMPMID-2110: Enable CLGEMMLowpMatrixMultiplyReshapeOnlyRHSKernel in CLGEMMLowpGian Marco Iodice
Change-Id: Ic32c803c3e2a067de10a7e46c85c962a970957b6 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/969 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-04-03COMPMID-2099: Enable dummy threads in ↵Gian Marco Iodice
CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel Change-Id: Id108c537eda3b5cba6718745d072fe18ac338aa5 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/933 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
2019-04-01COMPMID-2002: Implement CLGEMMLowpMatrixMultiplyReshapedOnlyRHS - TransposedGian Marco Iodice
Change-Id: I3907d151107766dc34749fe5710d7219e810b39f Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/875 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2019-03-20COMPMID-2043: Add support for "dummy threads" in CLGEMMReshapedGian Marco Iodice
Change-Id: I89403b97503fbb99f6a32f5d62b8c535ab26a7be Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/877 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-01-25COMPMID-1698: Implementing CLGEMMLowpMatrixMultiplyReshapedKernelGian Marco Iodice
Change-Id: Ia4db21b394a0b9235393202ce3c00b11cceb94ea Reviewed-on: https://review.mlplatform.org/568 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2018-11-09COMPMID-1451 - Removed unused OpenCL kernel from gemmlowp.clGian Marco Iodice
Removed gemmlowp_mm_bifrost_transposed_dot8 kernel as not used Change-Id: I43cf463a3a4c0cdb2808621c534ffd5c9fd47ca1
2018-11-08COMPMID-1451: Removed output_depth3d from ↵Gian Marco Iodice
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat Since we perform an element-wise operation, it is not necessary to pass the output_depth3d. Change-Id: Ibfa07a0706e902acf59b444aa61e18a348162ea9
2018-11-02COMPMID-1413 - Improve the performance of GEMMLowp with 8 bit dot product on ↵Gian Marco Iodice
OpenCL COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 % Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride but I have not seen any benefit (maybe because we have few arithemtic operation and we do not have more load instructions). However Depthwise convolution has been improved by 30% Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1451: Perform CLOutputStage using floats.Georgios Pinitas
Change-Id: Ic8312a5b6790aa7cd4468d42f08d557ad40e9441 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154570 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1607 - (Nightly) CLGEMMLowpMatrixMultiplyCore errors and mismatchesIsabella Gottardi
COMPMID-1608 - (Nightly) CLGEMMConvolutionLayer QASYMM8 errors and mismatches COMPMID-1609 - (Nightly) CLFullyConnectedLayer QASYMM8 mismatches Change-Id: I84c0d4f468be892f437f9f38b964dc7dfb66663a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150869 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1519: Add support for 3D input/output in CLGEMMLowpOutputStageGeorgios Pinitas
Change-Id: I637add70310d2da4d82b236a6352af9d33be17a1 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/149706 Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1518: Add support for GEMM3D in CLGEMMLowpMatrixMultiplyCoreGeorgios Pinitas
Change-Id: Ib14ac821ee5d4aff80bd602cd3e76e7018abb5e6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150268 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1433: Use Arm macro to check whether we support dot product instructionsGeorgios Pinitas
Change-Id: I70c0ee5adfac81dccae26b6756f424f4200ba584 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145990 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2018-11-02COMPMID-1431 Use either arm_dot or arm_dot_acc for CLGEMMLowp based on what ↵Giorgio Arena
is supported Change-Id: I4c5121e0f000d5ee94a8c8c5326272806f643e35 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/141520 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1288 Optimizing CLGEMMLowp using 8 bit dot product instructionGiorgio Arena
Change-Id: I536174b9381660a94578d6aa1892a6289a820391 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/139109 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-799 - Use new OpenCL 8-bit dot product instructionMichalis Spyrou
Change-Id: I03d6c6db13bcb565f117725bdab2b68c89a49e21 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/122185 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-882 - Optimizing GEMMLowp on OpenCL reshaping matricesGian Marco
This new optimization allows to achieve 36.3 % of MAC utilisation on Mate 9 @ 1GHz. The performance have been reported here https://confluence.arm.com/display/MLENG/GEMMLowp+performance%3A+ACL+18.02 Change-Id: I71b6a217068763dfdc11bbf3574ee0eb94f93679 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118531 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-816 - Optimizing CLGEMMLowpMatrixMultiplyCore - Part1Gian Marco
The performance improvements have been reported at the following confluence page: https://confluence.arm.com/display/MLENG/GEMMLowp+performance%3A+ACL+18.02 Config3 of McVail looks improved by 29x Change-Id: I8b203c0b75fc368f85cea863b7eed398fab3e79a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115783 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-661: Convolution quantized (#32)Chunosov
Change-Id: Id69df4ce98d1d89bdf9c9aa5c4d909659909b30f Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110456 Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-706 - Add GEMMLowp output stage for scaling by a fixed point numberGian Marco
DoD: - Implement NEON kernel for quantizing down the gemmlowp result. The result should be scaled by a fixedpoint number - Implement OpenCL kernel for quantizing down the gemmlowp result. The result should be scaled by a fixedpoint number - Add test for validating the result Required for: - Integration of GEMMLowp in Android NN - Convolution quantized - Fully connected quantized Change-Id: Ia963d25d695471e963961fb49a5600e78374ac4f Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110981 Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-661: QASYMM8 support for fully connected layer.Georgios Pinitas
Change-Id: I70e04d3a175ba366432ada98e9ca893c9f81b260 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/111094 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-697 - Rework GEMMLowp interface on OpenCLGian Marco
Reworked the interface of GemmLowp in order to make easy the integration in Android NN - Added support for different output stage - Added validation for both matrix multiplication and output stage - Added bounded relu support in the output stage - Added in32_t bias support - Added optimized path for vector by matrix case This rework is required for: - Convolution quantized - Fully connected quantized Change-Id: I512283d406099cf8c614dd89d0a97ed411143afc Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110625 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>