Age | Commit message (Collapse) | Author |
|
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: If9d6fa8c900b68c4b6fd373f2fc1f9abb83ea917
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4145
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I9354f7f1cb2583677441cc7b1ac857a5e950e42e
Signed-off-by: morgolock <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4100
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I65a738221a6c6fc3527ececda42f7a7e547755c1
Signed-off-by: morgolock <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3896
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I2c5e98aae7698963f106d7423df0e65cd00ee2a9
Signed-off-by: morgolock <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3710
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Assembly kernels do not support quantized GEMM when the multiplier is
greater than 1 (which leads to negative result shift used for
requantization).
Change-Id: I7a766cd0d13f549d217613ca67bc952923f309de
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3538
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
preferred presentation
Change-Id: Ib7dcfcbb24b408999dfae366b9da396485aacf78
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3525
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
We shouldn't perform reduction on matrix b when we
have a configured fused assembly kernel
Change-Id: I1f26c2afb387ee6ebbd54263e7255dab276ea08f
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3261
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
COMPMID-3082: Extend NEQLSTMLayer with enhancements
Change-Id: I88175b7bf69494a4eae510b74176fe8a0d6cd770
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/2969
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Id28062445590d6c06b35f7d7434eb38393ae94a7
Signed-off-by: SiCongLi <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/2875
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
scalar value
Change-Id: If2a242f52aea753591525d30a4cb64c1a766bf8d
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/2881
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I9c09d1002043fd2f927493a85924298d54b1ad9c
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/2854
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: I8667e75843fdd6ac75bd8272a86a348b830da28d
Reviewed-on: https://review.mlplatform.org/c/2548
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ib2b0c9ac88fc2b645f478c9981f71ee28f2c77fd
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2425
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I8fbbd2e399f48968337a60147098d04f27c2d1c0
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2402
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Rearrange the kernels in run to ensure type conversion takes place
before the matrix transformations.
Change-Id: Ibf47788fe71a84fd7549f8667549552e15ca8aab
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2251
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ic1bf5f0d21ccd525f84213a360f7e199d7f50577
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2177
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I7f52112d2d05b1ea3d3f3d4b19b8eafab05d6c44
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2141
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez <pablo.tello@arm.com>
|
|
Fix invalid check for performing fused output stage in the intrinsic
fallback path of NEGEMMLowpCore
Change-Id: I9fa5a2d32376500fcb3d74e31dc5753b677c826a
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/1652
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Perform offset reduction and requantization within the assembly wrapper.
Change-Id: I5d5b3e1f6f9ef4c71805362c57f88ff199c027a3
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/1541
Comments-Addressed: Pablo Marquez <pablo.tello@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I7859b82b2059e14685f8792424648ac5eacd67f1
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/1418
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Add support for:
-QSYMM8, 8-bit quantized symmetric
-QSYMM8_PER_CHANNEL, 8-bit quantized symmetric with per channel quantization
Change-Id: I00c4ff98e44af37419470af61419ee95d0de2463
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/1236
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove VirtualCall checks
- Fix some unused variables errors
- Use std::array insted of C style arrays
- Various fixes
Change-Id: Ife6170b7102de42b8f04e298dcf8476bf90779f0
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/1049
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Change-Id: Ie945526bd7845301458039edf3129253c1808505
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/938
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Sets correctly the quantization info in the auxilary tensors.
Change-Id: I35355d0aa2ab9716df3a91cbbf4609365fb109e4
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/872
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
NEGEMMLowpMatrixMultiplyCore
Change-Id: Ic1a681e4cc03e1eba3bf8485d9cdb17b3e926047
Signed-off-by: giuros01 <giuseppe.rossini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/561
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Mark auxilary tensors as resizable when cloning.
Change-Id: I582e6d09a7daadbc43cf02f46a53e51c178daacb
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/726
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
NEGEMMInterleavedKernel quantization information were unset when using the
respective convolution functions.
Change-Id: I1fd2af33aaadcfe3eda8bf20a0e56afa9b77c9bb
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/722
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
Change-Id: I1d5bc4d24059917f9ddef0873dd3043b1f2320a8
|
|
The issue was related to CLIm2Col when the number of input channels was less than
the number of elements processed by each thread.
The bug has been fixed in the validate_and_configure_window() function setting the correct number of elements accessed
in the output tensor.
Also fixed an issue GEMM3D when we have a single output channel
Change-Id: I094292d0c7662599c4a4c3916ec5f5821df5faef
|
|
OpenCL
COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride
With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 %
Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride
but I have not seen any benefit (maybe because we have few arithemtic operation and we
do not have more load instructions). However Depthwise convolution has been improved by
30%
Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
FP16/QASYMM8
When the GEMM3D check fails, now we fallback to the classic implementation with im2col
and col2im. In this manner the function can work with QASYMM8 and FP16
Change-Id: I359e9da3a63956f33b5acbc9bca4383b14af10e2
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/143372
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This makes it easier to integrate in GEMMLowpMatrixMultiplyCore
Change-Id: Ibf80803f016a2e6a24d943ffafb50b48f04ec545
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/140868
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Introduced a new IFunction for when we'll fork the arm_gemm functions
Increased encapsulation and abstraction of which method is used
Change-Id: I5fd8b14b5c77e7f8ecb09029b5e2eccd10dbdcf4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/139108
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: I5b46764f9c3154ec3e3b9c951cc9e6dfbcb81dfb
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/134255
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
|
|
The default templated merge, and the specialised S8 12x8 merge, were
using alpha and beta the wrong way round. Fixed.
Change-Id: Ie559b665edf1eb012e8cb54ea0bca31612bcc072
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/131309
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I281699ce7270aec1317c47b5a13799954cf6c9e8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/130010
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I4df63ec2f4eb27a8a6eec2bea27741bf8dec6910
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/126966
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Removed CPUTarget in favor of the CPUModel type.
CPUInfo now holds a vector of N CPUs.
CPUInfo autoinitialise upon construction with 1 GENERIC CPU.
CPPScheduler fills CPUInfo's vector upon construction (runtime).
IScheduler has a single CPUInfo obj and ThreadInfo always gets a pointer to it (avoid copying the vector)
Change-Id: I30f293258c959c87f6bac5eac8b963beb6a4d365
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/124626
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Id5ded4244f8a2983551308d7d3e7a07d266581ac
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/124680
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I1e2a1a77097d8017c274af3f97eba6964f80f5fa
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/122592
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Iec82a91ad351cfe8d07d0976a24bd42f4703177a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116833
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Assembly kernel interfaces were wrongly translating the layout of the
input matrices. Boolean flags transform0 and transform1 do not match the
actual interface of the gemm assembly code which expects transpose0 and
transposed1.
Change-Id: Ia4df65a533834647fa63e78e8c897924793949df
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113410
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: I5432b58e944b0bf75372de6d990600f38402009d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/112558
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
|
|
Change-Id: Ib57d4f7177cc6179302bda7ad870acb8bd3825f5
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/112115
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Id69df4ce98d1d89bdf9c9aa5c4d909659909b30f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110456
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This patch includes COMPMID-716 as well
- Added vector-matrix case in NEGEMMLowpMatrixMultiplyKernel
- Added benchmarks for NEON and OpenCL
Change-Id: I715cd25e8668a4d6c8127e9a298a865e7713267f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/111468
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Reworked the interface of GemmLowp in order to make easy the integration
in Android NN
- Added support for different output stage
- Added validation for both matrix multiplication and output stage
- Added bounded relu support in the output stage
- Added in32_t bias support
- Added optimized path for vector by matrix case
This rework is required for:
- Convolution quantized
- Fully connected quantized
Change-Id: I512283d406099cf8c614dd89d0a97ed411143afc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110625
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
|
|
Change-Id: I8a470cc1351593ad8eeaf4ec92e04865e83d4f3c
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/96147
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I791a08c1e333ce6fc5d537f50ab731fbe066e9c9
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/95737
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
The new interface makes NEGEMMLowp able to work with ASYMM8 data types.
Implemented 2 new functions:
- NEGEMMLowpMatrixMultiplyCore
- NEGEMMLowpOutputStage
These functions should make the integration in android NN doable
For more information about GEMMLowp:
https://github.com/google/gemmlowp/blob/master/doc/low-precision.md
Change-Id: Ie2c775f45234f68ca53dba644b3a912b997fd890
Reviewed-on: http://mpd-gerrit.cambridge.arm.com/95504
Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|