aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2018-11-02COMPMID-834 Fix arm_compute_nightly_validation getting killedMichalis Spyrou
Changed CLReductionOperationKernel: Now each kernel computes a 2D slice instead of 1D. This reduces the memory footprint from around 1.6Gb for a 4k input image to a few Mb, which was caused by the __local memory and was probably the cause for this bug. Change-Id: I71ac71ff09b041c945a134177600f0f3475e48cf Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117835 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-848 NEPoolingLayerKernel incorrectly reportsMichalis Spyrou
it supports asymmetric padding Add asymmetric padding support for NEPoolingLayer Change-Id: Ia5cc660aeca636c3c45df4916a28974cc2b7f2f4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117275 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-748 - Integrating optimized SGEMM for bifrostGian Marco
This patch introduces a new GEMM capable to improve the mac utilisation of 10% compared to the GEMM without reshape. However this implementation is not faster in all cases as we need to take into account the time for reshaping the matrices. For this reason an heuristic solution to select the optimal GEMM to use has been added to the function. More information about the heuristic implementation can be found at COMPMID-852. With this new patch, GoogleNet, MobileNet, VGG16 and SqueezeNet can improved the performance of 1.5x. More information about the performance uplift can be found here: https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.02 Change-Id: I024563c06b9aed02a211a974e452bae5c233b04c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117140 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-860: Neon HGEMM integrated assembly kernel from RSH for Arm ↵Pablo Tello
Cortex-A55r1. Change-Id: I640ae54dcc4591915c7a539b27728f05b70cf0eb Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117616 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-798 Add instrumentation to NEON kernelsAnthony Barbier
Change-Id: I9dbb090cac731d68bd98a7d1c8ab0e1cb0a5c911 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116746 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-816 - Optimizing CLGEMMLowpMatrixMultiplyCore - Part1Gian Marco
The performance improvements have been reported at the following confluence page: https://confluence.arm.com/display/MLENG/GEMMLowp+performance%3A+ACL+18.02 Config3 of McVail looks improved by 29x Change-Id: I8b203c0b75fc368f85cea863b7eed398fab3e79a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115783 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-842: Add NEON QASYMM8 RELU ActivationMichele Di Giorgio
Change-Id: I7197d2ad7ac08112eba1570a257ad011b1ce0b75 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117404 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-858: Assert in ICLKernel on higher window dimensions moved to enqueueAnthony Barbier
Change-Id: I49d501e82f5c69b6912cb9e5fa684a904c62ed8e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117409 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-841: Add CL QASYMM8 RELU ActivationMichele Di Giorgio
Change-Id: I8e0b7cad2f977942224d0116e8498bf9b2d6014d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117229 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-784: Doxygen fixesPablo Tello
Change-Id: I35f429fbf08dece7c759242c37e0a68b0851ce49 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117231 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-377: GCConvoutionLayer support for FP16Stephen Li
Change-Id: I801b5e393a16a9f92c062826e6fcfd5982ca7bb3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116584 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-815: Updated NEWinogradLayer with the lastest code from Research.Pablo Tello
Change-Id: I86d7f53b5f5d1dbc22078aea5c32b08a25d1f49e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116634 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-719: NEPermuteKernel refactoringPablo Tello
Change-Id: I91b43d9706ac3244ce43684967ace0b022d35bad Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114988 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-838 Implement CLPermuteMichalis Spyrou
Change-Id: I6d97b649f1ebc289c9e6f8949e67740a6b3cbcb2 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116636 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-674 - Create Google InceptionV3 exampleGeorgios Pinitas
Change-Id: I389e0d4104b7dde60b7cdd612a83f3328517e44c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115804 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-791: Adds support of QASYMM8 in NEDepthwiseConvolution3x3Georgios Pinitas
Change-Id: I1a9ed6c3420ddf8978aeaad48d9915333b006b49 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116374 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02IVGCVSW-847 Fix {NEON/CL}PoolingLayerKernel configDiego Lopez Recas
Also, add validation test that hits the discovered failure for CL. Change-Id: I5573e0a3f169b85d5fb7299e7c48d74be7165208 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/112717 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-751 QASYMM8 ActivationLayer optimisation: don't requantize if not ↵Giorgio Arena
necessary Change-Id: Iea8a21f7c71025bfde6fdf7c7a7c92ba749b189b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116673 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-751 Processing 8 elements makes computation up to 80us faster on ↵Giorgio Arena
MobileNet QASYMM8 dwc layers Change-Id: I30eaea3f3625086e311ad201ef73a8f06a01e382 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116521 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-752 Creating an example for QASYMM8 MobileNetGiorgio Arena
Change-Id: Ic76b3b6adaff8c84ba4d2ca5283d9291c69344f0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114466 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-769 Add asymmetric padding support in NEON kernels.Michalis Spyrou
- NEDirectConvolutionLayer - NEDepthwiseConvolutionLayer3x3 Change-Id: Id4d7d17ee334639c059015a290b8fc34712706ee Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115430 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-816 - Enabled CLConvolutionLayer to use CLGEMM function insteadGian Marco
of CLGEMMMatrixMultiplyKernel kernel. Change-Id: If035fa3d1fb3ff4012442bcd908c370d21aa6657 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115990 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-830 Fix hang in arm_compute_benchmark NEONMichalis Spyrou
Problem seems to happen when calling clfinish inside the CLScheduler destructor. Removed destructor and now calling sync() in benchmarks main.cpp. Change-Id: Ibb36a0d19aa03349d291407a1fb8266dce3ec75b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116288 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-785: Add QASYMM8 support for pooling layerGeorgios Pinitas
Adds generic pooling case for QASYMM8 Change-Id: I37d38a92ca61651e915fbbbb6da88e180390b4ab Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115439 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-765: Fixed unused variable warningAnthony Barbier
Change-Id: I244954f748169cefcf71409bc9fdbc45de816ba5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115878 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-753 Add benchmarks for GEMM/GEMMLowp used in AlexNetGiorgio Arena
Change-Id: Ie680065fe98c2fcdefad1fd5240f0a951df6e4cf Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115779 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02IVGCVSW-863 calculate_max_window..() family takes ValidRegionDiego Lopez Recas
Change-Id: I91e39713ffa580e9d2213988ad3517a8a41bf4e8 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114013 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-376: Work around for scale validation error.Frank Lei
Use "vec2 scale" instead of scale_x/scale_y to work around this issue. Change-Id: Ieae55327596fdb853d7b625262fec3a3a84f577c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115143 Reviewed-by: Joel Liang <joel.liang@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Frank Lei <frank.lei@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-471 Implement Deconvolution on OpenCLMichalis Spyrou
Change-Id: Ie00c6b08a51d30c5ce2637d40ee3d165b8a68686 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110311 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-727 - Implement reference and CL/NEON validation for ↵Sanghoon Lee
CustomConvolutionRectangle Change-Id: I108a48ad5e6dc3f331fd5ceb38ced8ccdb31d81a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113130 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-784: Removed no longer needed file winograd_shim_nchw.hppPablo Tello
Change-Id: If72b649fce21d0b8b9c28a1b064c4cf5adb06c15 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115502 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-784: Winograd refactoringPablo Tello
Removed the code that created a subtensor and imported memory from the workspace in the function run() method. The subtensor is no longer needed because we perform the reordering of the tensors with NEPermute. The call to the method winograd::Winograd2x2_3x3GEMM<TOut, TIn>::reshape_output() will transform the results from the winograd domain into the spatial domain and this will be stored in the member _output_nhwc. Change-Id: Iae09d26c7587cd2eed98968c3ce214e20031038e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115483 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-783: Segfault in OCLGrindGeorgios Pinitas
Enforce clFinish to be called on destruction of the CLScheduler to ensure that no leftovers are in the queue which might lead to the retain of the queue and its deferred destruction. Change-Id: Ic71933f65cdccd74f4f01a6e2ec1a049995f5b50 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115389 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-298: Remove the old shader common codeJoel Liang
Remove token pasting operator support for GLES shader Remove cs_shdaers/helpers.h (The old GLES shader common code) Remove class BufferParam. We don't need to pass the buffer_data_type_shift to GLES shader. Change-Id: Ic4fa6b2fb7647b8f69759f6077ae4a5b483cc04d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115448 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Frank Lei <frank.lei@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-589: Port HOGDescriptor to new validationJohn Richardson
Change-Id: I2021612e61de1b82aaeb49249d06929c7fceb15f Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115216 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-759 - CLGEMM optimization for McVail benchmarksGian Marco
This patch introduces an optimization for CLGEMM on Bifrost architectures which can bring to 40% of FMA utilization on config 3 of McVail. The new CLGEMM does not require any reshape of matrix A and matrix B. This patch also adds the auto-config in CLConvolutionLayer and CLGEMM and extends the interface for NEGEMM and CLGEMM. Change-Id: Ibb354eda45e9ca64b14a99700fb21dff5989dda9 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113716 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-373: Rewrite the convolution_layer.cs with the new common codezhenglin
Change-Id: I4aa3999159f0448592f5f704ebcd37b26f9b1e51 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115279 Reviewed-by: Joel Liang <joel.liang@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-374: Rewrite the dropout.cs with the common codezhenglin
Change-Id: Ic2be14d626856faa4496c588154ef5cfb66d4e2c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115282 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Joel Liang <joel.liang@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-746 Allow NEDirectConvolution to work without biases for QS.Michalis Spyrou
Renamed BiasAccumulateKernel to OutputStage. If no bias is provided when the input is quantized, the kernel simply downscales the input. Throw error if no bias is provided and input is floating point. Change-Id: I645a4ee9c6014b0547778fdd92c9ec72ef2f0aab Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114158 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-375: Rewrite the transpose.cs with the new common codezhenglin
Change-Id: I373e349ac35ff52ebcc895723d8aa61b754519d4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115283 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Joel Liang <joel.liang@arm.com>
2018-11-02APPBROWSER-372: Rewrite the direct_convolution5x5.cs with the new common codeJoel Liang
Change-Id: Ie2f398d62dea97e9201f77d22c9f0796db297b63 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115280 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Zhenglin Li <zhenglin.li@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-370: Rewrite the normalization_layer.cs with the new common codezhenglin
Change-Id: I717d0ebbae5102da039b9295649aed8056e4cdfd Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114960 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Joel Liang <joel.liang@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-773: Add CL/NEON Harris Corners benchmark testsAlex Gilday
Change-Id: Idf452cfa0428a36f2d718a6d438d6e59897e1e99 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115061 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-579: Port Derivative to new validationJohn Richardson
Change-Id: Iecbfa3ebab890c778fb475403466d6fb168e9968 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113357 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-371: Rewrite the direct_convolution3x3.cs with the new common codeJoel Liang
Change-Id: I82a3ec133193433ba9ed3efcb49c51a2b95b16c0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114962 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Zhenglin Li <zhenglin.li@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-369: Rewrite the gemm.cs with the new common codezhenglin
Change-Id: I9db00c846fa7fc223a22ab775025dfdea587ade8 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114957 Reviewed-by: Joel Liang <joel.liang@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-750: Enabled support for U8 and S8 datatypes in ↵Pablo Tello
NEGEMMLowpAArch64V8P4Kernel Change-Id: If32cbdc65f2e1441595cae5b4824a9b4357c8bf6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113467 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02APPBROWSER-366: Add DepthwiseConvolutionLayer(fp16 only) support.Frank Lei
Change-Id: I051b7e56b60bf1a55cdf014539ef71346d3aee26 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114737 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-719: NEWinogradLayer reordering using NEPermute.Pablo Tello
Input reordering from NCHW to NHWC Output reordering from NHWC to NCHW Weights reordering from [Ofm x Ifm x Height x Width] to [Height x Width x Ifm x Ofm] Change-Id: I85aabedb1f9c13700bc4919eb3130f4d4bd0b465 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113631 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-357: Fix Transpose performance issue by tuning lwssteli01
Change-Id: Ia71435f6e5c5610e2b76d6d4eb61a8847ca42305 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114829 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>