aboutsummaryrefslogtreecommitdiff
path: root/docs/user_guide/release_version_and_change_log.dox
diff options
context:
space:
mode:
Diffstat (limited to 'docs/user_guide/release_version_and_change_log.dox')
-rw-r--r--docs/user_guide/release_version_and_change_log.dox1717
1 files changed, 1717 insertions, 0 deletions
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox
new file mode 100644
index 0000000000..ca8092797f
--- /dev/null
+++ b/docs/user_guide/release_version_and_change_log.dox
@@ -0,0 +1,1717 @@
+///
+/// Copyright (c) 2017-2024 Arm Limited.
+///
+/// SPDX-License-Identifier: MIT
+///
+/// Permission is hereby granted, free of charge, to any person obtaining a copy
+/// of this software and associated documentation files (the "Software"), to
+/// deal in the Software without restriction, including without limitation the
+/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+/// sell copies of the Software, and to permit persons to whom the Software is
+/// furnished to do so, subject to the following conditions:
+///
+/// The above copyright notice and this permission notice shall be included in all
+/// copies or substantial portions of the Software.
+///
+/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+/// SOFTWARE.
+///
+namespace arm_compute
+{
+/** @page versions_changelogs Release Versions and Changelog
+
+@tableofcontents
+
+@section S2_1_versions Release versions
+
+All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number.
+If there is more than one release in a month then an extra sequential number is appended at the end:
+
+ v17.03 (First release of March 2017)
+ v17.03.1 (Second release of March 2017)
+ v17.04 (First release of April 2017)
+
+@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
+@note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly.
+
+@section S2_2_changelog Changelog
+
+v24.05 Public major release
+ - Add @ref CLScatter operator for FP32/16, S32/16/8, U32/16/8 data types
+
+v24.04 Public major release
+ - Add Bfloat16 data type support for @ref NEMatMul.
+ - Add support for SoftMax in SME2 for FP32 and FP16.
+ - Add support for in place accumulation to CPU GEMM kernels.
+ - Add low-precision Int8 * Int8 -> FP32 CPU GEMM which dequantizes after multiplication
+ - Add is_dynamic flag to QuantizationInfo to signal to operators that it may change after configuration
+ - Performance optimizations:
+ - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
+ - Optimize @ref NEConvolutionLayer for input tensor size > 1e7 bytes and weight tensor height > 7
+ - Optimize @ref NESoftmaxLayer for axis != 0 by natively supporting higher axes up to axis 3.
+
+v24.02.1 Public patch release
+ - Fix performance regression in fixed-format kernels
+ - Fix compile and runtime errors in arm_compute_validation for Windows on Arm(WoA)
+
+v24.02 Public major release
+ - Replace template writer with compute kernel writer in dynamic fusion.
+ - Performance optimizations:
+ - Parallelize @ref NEDepthwiseConvolutionLayer over batches if there is only 1 row
+
+v24.01 Public major release
+ - Remove the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose.
+ You should link only to the main `libarm_compute` library for core functionality.
+ - Expand GPUTarget list with Mali™ G720 and G620.
+ - Optimize CPU activation functions using LUT-based implementation:
+ - Sigmoid function for FP16.
+ - New features
+ - Add support for FP16 in all multi_isa builds.
+ - Performance optimizations:
+ - Optimize @ref NESoftmaxLayer
+ - Optimize @ref NEDepthToSpaceLayer.
+
+v23.11 Public major release
+ - New features
+ - Add support for input data type U64/S64 in CLCast and NECast.
+ - Add support for output data type S64 in NEArgMinMaxLayer and CLArgMinMaxLayer
+ - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface:
+ - @ref experimental::dynamic_fusion::GpuCkwResize
+ - @ref experimental::dynamic_fusion::GpuCkwPool2d
+ - @ref experimental::dynamic_fusion::GpuCkwDepthwiseConv2d
+ - @ref experimental::dynamic_fusion::GpuCkwMatMul
+ - Add support for OpenCL™ comand buffer with mutable dispatch extension.
+ - Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82.
+ - Add support for negative axis values and inverted axis values in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse.
+ - Add new OpenCL™ kernels:
+ - @ref opencl::kernels::ClMatMulLowpNativeMMULKernel support for QASYMM8 and QASYMM8_SIGNED, with batch support
+ - Performance optimizations:
+ - Optimize @ref cpu::CpuReshape
+ - Optimize @ref opencl::ClTranspose
+ - Optimize @ref NEStackLayer
+ - Optimize @ref CLReductionOperation.
+ - Optimize @ref CLSoftmaxLayer.
+ - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
+ - Reduce CPU Overhead by optimal flushing of CL kernels.
+ - Deprecate support for Bfloat16 in @ref cpu::CpuCast.
+ - Support for U32 axis in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse will be deprecated in 24.02.
+ - Remove legacy PostOps interface. PostOps was the experimental interface for kernel fusion and is replaced by the new Dynamic Fusion interface.
+ - Update OpenCL™ API headers to v2023.04.17
+
+v23.08 Public major release
+ - Deprecate the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose.
+ Users must no longer link their applications to this library and instead link only to the main `libarm_compute` library for core functionality.
+ - New features
+ - Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output.
+ - Add multi-sketch support for dynamic fusion.
+ - Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers.
+ - Add Fused Activation to CLMatMul.
+ - Implement FP32/FP16 @ref opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension.
+ - Use MatMul in fully connected layer with dynamic weights when supported.
+ - Optimize CPU depthwise convolution with channel multiplier.
+ - Add support in CpuCastKernel for conversion of S64/U64 to F32.
+ - Add new OpenCL™ kernels:
+ - @ref opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support
+ - Enable transposed convolution with non-square kernels on CPU and GPU.
+ - Add support for input data type U64/S64 in CLCast.
+ - Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion.
+ - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only:
+ - @ref experimental::dynamic_fusion::GpuCkwActivation
+ - @ref experimental::dynamic_fusion::GpuCkwCast
+ - @ref experimental::dynamic_fusion::GpuCkwDirectConv2d
+ - @ref experimental::dynamic_fusion::GpuCkwElementwiseBinary
+ - @ref experimental::dynamic_fusion::GpuCkwStore
+ - Various optimizations and bug fixes.
+
+v23.05.1 Public patch release
+ - Enable CMake and Bazel option to build multi_isa without FP16 support.
+ - Fix compilation error in NEReorderLayer (aarch64 only).
+ - Disable invalid (false-negative) validation test with CPU scale layer on FP16.
+ - Various bug fixes
+
+v23.05 Public major release
+ - New features:
+ - Add new Arm® Neon™ kernels / functions:
+ - @ref NEMatMul for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
+ - NEReorderLayer (aarch64 only)
+ - Add new OpenCL™ kernels / functions:
+ - @ref CLMatMul support for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
+ - Add support for the multiple dimensions in the indices parameter for both the Arm® Neon™ and OpenCL™ implementations of the Gather Layer.
+ - Add support for dynamic weights in @ref CLFullyConnectedLayer and @ref NEFullyConnectedLayer for all data types.
+ - Add support for cropping in the Arm® Neon™ and OpenCL™: implementations of the BatchToSpace Layer for all data types.
+ - Add support for quantized data types for the ElementwiseUnary Operators for Arm® Neon™.
+ - Implement RSQRT for quantized data types on OpenCL™.
+ - Add FP16 depthwise convolution kernels for SME2.
+ - Performance optimizations:
+ - Improve CLTuner exhaustive mode tuning time.
+ - Deprecate dynamic block shape in @ref NEBatchToSpaceLayer and @ref CLBatchToSpaceLayer.
+ - Various optimizations and bug fixes.
+
+v23.02.1 Public patch release
+ - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels.
+ - Fixes for experimental CPU only Bazel and CMake builds.
+
+v23.02 Public major release
+ - New features:
+ - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator.
+ - Add the following operators to the experimental dynamic fusion API:
+ - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub.
+ - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling.
+ - Add new CPU operator AddMulAdd for float and quantized types.
+ - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings.
+ - Add experimental support for CPU only Bazel and CMake builds.
+ - Performance optimizations:
+ - Optimize CPU base-e exponential functions for FP32.
+ - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible.
+ - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel.
+ - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain.
+ - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain.
+ - Update the heuristic for CLDepthwiseConvolutionNative kernel.
+ - Add new optimized OpenCL kernel to compute indirect convolution:
+ - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink
+ - Add new optimized OpenCL kernel to compute transposed convolution:
+ - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink
+ - Update recommended/minimum NDK version to r20b.
+ - Various optimizations and bug fixes.
+
+v22.11 Public major release
+ - New features:
+ - Add new experimental dynamic fusion API.
+ - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32.
+ - Add CPU MeanStdDevNorm for QASYMM8.
+ - Add CPU and GPU GELU activation function for FP32 and FP16.
+ - Add CPU swish activation function for FP32 and FP16.
+ - Performance optimizations:
+ - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8.
+ - Optimize CPU activation functions using LUT-based implementation:
+ - Sigmoid function for QASYMM8 and QASYMM8_SIGNED.
+ - Hard swish function for QASYMM8_SIGNED.
+ - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic.
+ - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D.
+ - Optimize GPU depthwise convolution kernel and heuristic.
+ - Optimize GPU Conv2d heuristic.
+ - Optimize CPU MeanStdDevNorm for FP16.
+ - Optimize CPU tanh activation function for FP16 using rational approximation.
+ - Improve GPU GeMMLowp start-up time.
+ - Various optimizations and bug fixes.
+
+v22.08 Public major release
+ - Various bug fixes.
+ - Disable unsafe FP optimizations causing accuracy issues in:
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink
+ - @ref CLDepthwiseConvolutionLayerNativeKernel
+ - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add.
+ - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615.
+ - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel.
+ - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310.
+ - Extend the direct convolution 2d interface to configure the block size.
+ - Update ClConv2D heuristic to use direct convolution.
+ - Use official Khronos® OpenCL extensions:
+ - Add cl_khr_integer_dot_product extension support.
+ - Add support of OpenCL 3.0 non-uniform workgroup.
+ - Cpu performance optimizations:
+ - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build.
+ - Optimize Add layer by considering the input tensors as 1D array.
+ - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights.
+ - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator.
+ - Add experimental support for native builds for Windows® on Arm™.
+ - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions.
+ - Build flag change: toolchain_prefix, compiler_prefix:
+ - Use empty string "" to suppress any prefixes.
+ - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified.
+ - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools.
+ - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto".
+ - armv7a with Android build will no longer be tested or maintained.
+
+v22.05 Public major release
+ - Various bug fixes.
+ - Various optimizations.
+ - Add support for NDK r23b.
+ - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
+ - New Arm® Neon™ kernels / functions :
+ - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink
+ - New OpenCL kernels / functions :
+ - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink
+ - Improve the start-up times for the following OpenCL kernels:
+ - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink
+ - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink
+ - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink
+ - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink
+ - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
+ - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
+ - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink
+ - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink
+ - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink
+ - @ref NEFuseBatchNormalizationKernel
+ - @ref NEL2NormalizeLayerKernel
+
+v22.02 Public major release
+ - Various bug fixes.
+ - Various optimizations.
+ - Update A510 arm_gemm cpu Kernels.
+ - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
+ - Improve the start-up time for the following OpenCL kernels:
+ - @ref CLScale
+ - @ref CLGEMM
+ - @ref CLDepthwiseConvolutionLayer
+ - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
+ - Remove functions:
+ - CLRemap
+ - NERemap
+ - Remove padding from OpenCL kernels:
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
+ - Remove padding from Cpu kernels:
+ - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
+ - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
+ - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink
+ - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink
+ - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink
+ - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink
+ - @ref NEBoundingBoxTransformKernel
+ - @ref NECropKernel
+ - @ref NEComputeAllAnchorsKernel
+ - @ref NEInstanceNormalizationLayerKernel
+ - NEMaxUnpoolingLayerKernel
+ - @ref NEMeanStdDevNormalizationKernel
+ - @ref NERangeKernel
+ - @ref NEROIAlignLayerKernel
+ - @ref NESelectKernel
+
+v21.11 Public major release
+ - Various bug fixes.
+ - Various optimizations:
+ - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types
+ - Improve performance of Softmax on GPU for Uint8/Int8
+ - New OpenCL kernels / functions:
+ - @ref CLConv3D
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEConv3D
+ - Support configurable build by a selected subset of operator list
+ - Support MobileBert on Neon™ backend
+ - Improve operator/function logging
+ - Remove padding from OpenCL kernels:
+ - ClPool2dKernel
+ - ClScaleKernel
+ - ClGemmMatrixMultiplyReshapedKernel
+ - Remove padding from Cpu kernels:
+ - CpuPool2dKernel
+ - Remove Y padding from OpenCL kernels:
+ - ClGemmMatrixMultiplyKernel
+ - ClGemmReshapedRHSMatrixKernel
+ - Remove legacy GeMM kernels in gemm_v1.cl
+
+v21.08 Public major release
+ - Various bug fixes.
+ - Various optimizations:
+ - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used
+ - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures
+ - Add dynamic weights support in Fully connected layer (CPU/GPU)
+ - Various performance optimizations for floating-point data types (CPU/GPU)
+ - Add a reduced core library build arm_compute_core_v2
+ - Expose Operator API
+ - Support fat binary build for arm8.2-a via fat_binary build flag
+ - Add CPU discovery capabilities
+ - Add data type f16 support for:
+ - CLRemapKernel
+ - Port the following functions to stateless API:
+ - @ref CLConvolutionLayer
+ - @ref CLFlattenLayer
+ - @ref CLFullyConnectedLayer
+ - @ref CLGEMM
+ - @ref CLGEMMConvolutionLayer
+ - @ref CLGEMMLowpMatrixMultiplyCore
+ - @ref CLWinogradConvolutionLayer
+ - @ref NEConvolutionLayer
+ - @ref NEFlattenLayer
+ - @ref NEFullyConnectedLayer
+ - @ref NEGEMM
+ - @ref NEGEMMConv2d
+ - @ref NEGEMMConvolutionLayer
+ - @ref NEGEMMLowpMatrixMultiplyCore
+ - @ref NEWinogradConvolutionLayer
+ - Remove the following functions:
+ - CLWinogradInputTransform
+ - Remove CLCoreRuntimeContext
+ - Remove ICPPSimpleKernel
+ - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h
+
+v21.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Various documentation updates:
+ - Add supported operators and corresponding Android NNAPI operators.
+ - Documentation reorg into user guide and contributor guide.
+ - Add support for a global allocator for OpenCL tensors
+ - Add experimental support for [CLVK](https://github.com/kpet/clvk).
+ - Add data type S32 support for:
+ - @ref opencl::kernels::ClArithmeticKernel
+ - Add data type QASYMM8 support for:
+ - @ref CLROIPoolingLayer
+ - @ref CLROIPoolingLayerKernel
+ - @ref NEROIPoolingLayer
+ - @ref NEROIPoolingLayerKernel
+ - Add per-channel quantization support for:
+ - @ref CLDeconvolutionLayer
+ - @ref CLDirectDeconvolutionLayer
+ - @ref NEConvolutionLayer
+ - @ref NEDeconvolutionLayer
+ - Remove padding from OpenCL kernels:
+ - @ref CLL2NormalizeLayerKernel
+ - CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - @ref CLNormalizationLayerKernel
+ - @ref CLNormalizePlanarYUVLayerKernel
+ - @ref opencl::kernels::ClMulKernel
+ - @ref CLReductionOperationKernel
+ - @ref CLROIPoolingLayerKernel
+ - Remove computer vision support from Arm® Neon™ backend
+ - Remove the following functions:
+ - NEAbsoluteDifference
+ - NEAccumulate
+ - NEBox3x3
+ - NECannyEdge
+ - NEChannelCombine
+ - NEChannelExtract
+ - NEColorConvert
+ - NEConvolution
+ - NEDerivative
+ - NEDilate
+ - NEEqualizeHistogram
+ - NEErode
+ - NEFastCorners
+ - NEGaussian3x3
+ - NEGaussian5x5
+ - NEGaussianPyramid
+ - NEHOGDescriptor
+ - NEHOGDetector
+ - NEHOGGradient
+ - NEHOGMultiDetection
+ - NEHarrisCorners
+ - NEHistogram
+ - NEIntegralImage
+ - NELaplacianPyramid
+ - NELaplacianReconstruct
+ - NEMagnitude
+ - NEMeanStdDev
+ - NEMedian3x3
+ - NEMinMaxLocation
+ - NENonLinearFilter
+ - NEOpticalFlow
+ - NEPhase
+ - NEScharr3x3
+ - NESobel3x3
+ - NESobel5x5
+ - NESobel7x7
+ - NETableLookup
+ - NEThreshold
+ - NEWarpAffine
+ - NEWarpPerspectiveKernel
+ - Remove all GLES kernels / functions / tests / examples
+ - Remove computer vision support from CL backend
+ - Remove the following functions:
+ - CLAbsoluteDifference
+ - CLAccumulate
+ - CLBox3x3
+ - CLCannyEdge
+ - CLChannelCombine
+ - CLChannelExtract
+ - CLColorConvert
+ - CLConvolution
+ - CLDerivative
+ - CLDilate
+ - CLEqualizeHistogram
+ - CLErode
+ - CLFastCorners
+ - CLGaussian3x3
+ - CLGaussian5x5
+ - CLGaussianPyramid
+ - CLHOGDescriptor
+ - CLHOGDetector
+ - CLHOGGradient
+ - CLHOGMultiDetection
+ - CLHarrisCorners
+ - CLHistogram
+ - CLIntegralImage
+ - CLLaplacianPyramid
+ - CLLaplacianReconstruct
+ - CLMagnitude
+ - CLMeanStdDev
+ - CLMedian3x3
+ - CLMinMaxLocation
+ - CLNonLinearFilter
+ - CLOpticalFlow
+ - CLPhase
+ - CLScharr3x3
+ - CLSobel3x3
+ - CLSobel5x5
+ - CLSobel7x7
+ - CLTableLookup
+ - CLThreshold
+ - CLWarpAffine
+ - CLWarpPerspective
+
+v21.02 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Upgrade C++ standard to C++14
+ - Add macOS support
+ - Add Armv8-R AArch64 architecture support
+ - Add SVE/SVE2 support for:
+ - NEScaleKernel
+ - @ref NEActivationLayer
+ - @ref NEArithmeticAddition
+ - @ref NEBatchNormalizationLayerKernel
+ - cpu::kernels::CpuLogits1DSoftmaxKernel
+ - cpu::kernels::CpuLogits1DMaxKernel
+ - @ref cpu::kernels::CpuElementwiseUnaryKernel
+ - Remove padding from OpenCL kernels:
+ - CLDirectConvolutionLayerKernel
+ - @ref CLArgMinMaxLayerKernel
+ - @ref CLPadLayerKernel
+ - @ref CLROIAlignLayerKernel
+ - @ref CLRangeKernel
+ - CLScaleKernel
+ - @ref CLSelectKernel
+ - @ref CLBitwiseKernel
+ - @ref opencl::kernels::ClFloorKernel
+ - CLTransposeKernel
+ - Deprecate functions in CLTuner:
+ - add_lws_to_table
+ - import_lws_table
+ - lws_table
+ - Remove functions:
+ - NELocallyConnectedLayer / CLLocallyConnectedLayer
+ - NEIm2Col
+ - NECol2Im
+ - NEGEMMInterleave4x4
+ - NEGEMMTranspose1xW
+ - NEComputeAllAnchors / CLComputeAllAnchors
+ - NEGEMMAssemblyDispatch
+ - NEUpsampleLayer / CLUpsampleLayer
+ - Remove kernels:
+ - NEGEMMMatrixVectorMultiplyKernel
+ - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel
+ - NEUpsampleLayerKernel / CLUpsampleLayerKernel
+ - Extend OpenCL tuner with workgroup batch size support
+ - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units
+ - Add functionality to load the OpenCL GEMM heuristics at runtime
+ - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL
+ - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation
+ - Note: data-type decoupling is in progress and experimental. Warning of unused symbols might be raised
+
+v20.11 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type.
+ This is planned to be resolved in 21.02 release.
+ - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer.
+ - Added new data type S32 support for:
+ - NEArithmeticSubtraction
+ - NEArithmeticSubtractionKernel
+ - @ref NEPixelWiseMultiplication
+ - NEPixelWiseMultiplicationKernel
+ - NEElementwiseDivision
+ - NEDivisionOperationKernel
+ - Interface change
+ - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension
+ on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5.
+ The supported value range of axis is [-rank, rank).
+ This change applies to the following functions:
+ - @ref NESoftmaxLayer
+ - @ref NELogSoftmaxLayer
+ - @ref CLSoftmaxLayer
+ - @ref CLLogSoftmaxLayer
+ - GCSoftmaxLayer
+ - New OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+ - @ref CLLogicalNot
+ - @ref CLLogicalAnd
+ - @ref CLLogicalOr
+ - New Arm® Neon™ kernels / functions:
+ - @ref NELogicalNot
+ - @ref NELogicalAnd
+ - @ref NELogicalOr
+ - Removed padding from Arm® Neon™ kernels:
+ - NEComplexPixelWiseMultiplicationKernel
+ - NENonMaximaSuppression3x3Kernel
+ - NERemapKernel
+ - NEGEMMInterleave4x4Kernel
+ - NEDirectConvolutionLayerKernel
+ - NEScaleKernel
+ - NELocallyConnectedMatrixMultiplyKernel
+ - NEGEMMLowpOffsetContributionKernel
+ - NEGEMMTranspose1xWKernel
+ - NEPoolingLayerKernel
+ - NEConvolutionKernel
+ - NEDepthwiseConvolutionLayerNativeKernel
+ - NEGEMMLowpMatrixMultiplyKernel
+ - NEGEMMMatrixMultiplyKernel
+ - NEDirectConvolutionLayerOutputStageKernel
+ - @ref NEReductionOperationKernel
+ - NEGEMMLowpMatrixAReductionKernel
+ - NEGEMMLowpMatrixBReductionKernel
+ - Removed padding from OpenCL kernels:
+ - CLBatchConcatenateLayerKernel
+ - CLElementwiseOperationKernel
+ - @ref CLBatchNormalizationLayerKernel
+ - CLPoolingLayerKernel
+ - CLWinogradInputTransformKernel
+ - CLGEMMLowpMatrixMultiplyNativeKernel
+ - CLGEMMLowpMatrixAReductionKernel
+ - CLGEMMLowpMatrixBReductionKernel
+ - CLGEMMLowpOffsetContributionOutputStageKernel
+ - CLGEMMLowpOffsetContributionKernel
+ - CLWinogradOutputTransformKernel
+ - CLGEMMLowpMatrixMultiplyReshapedKernel
+ - @ref CLFuseBatchNormalizationKernel
+ - @ref CLDepthwiseConvolutionLayerNativeKernel
+ - CLDepthConvertLayerKernel
+ - CLCopyKernel
+ - CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - CLActivationLayerKernel
+ - CLWinogradFilterTransformKernel
+ - CLWidthConcatenateLayerKernel
+ - CLWidthConcatenate4TensorsKernel
+ - CLWidthConcatenate2TensorsKernel
+ - CLLogits1DMaxShiftExpSumKernel
+ - CLLogits1DNormKernel
+ - CLHeightConcatenateLayerKernel
+ - CLGEMMMatrixMultiplyKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+ - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLDepthConcatenateLayerKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+ - Removed OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
+ - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - CLLocallyConnectedLayer
+ - CLLocallyConnectedMatrixMultiplyKernel
+ - CLAbsoluteDifference
+ - CLAbsoluteDifferenceKernel
+ - CLAccumulate
+ - CLAccumulateKernel
+ - CLAccumulateSquared
+ - CLAccumulateSquaredKernel
+ - CLAccumulateWeighted
+ - CLAccumulateWeightedKernel
+ - CLAccumulateWeightedFP16Kernel
+ - CLBox3x3
+ - CLBox3x3Kernel
+ - CLBox3x3FP16Kernel
+ - CLCannyEdge
+ - CLChannelCombine
+ - CLChannelCombineKernel
+ - CLChannelExtract
+ - CLChannelExtractKernel
+ - CLColorConvert
+ - CLColorConvertKernel
+ - CLConvolution3x3
+ - CLConvolutionRectangle
+ - CLConvolutionRectangleKernel
+ - CLConvolutionSquare
+ - CLConvolutionKernel
+ - CLDerivative
+ - CLDerivativeKernel
+ - CLDilate
+ - CLDilateKernel
+ - CLEqualizeHistogram
+ - CLErode
+ - CLErodeKernel
+ - CLFastCorners
+ - CLFastCornersKernel
+ - CLGaussian3x3
+ - CLGaussian3x3Kernel
+ - CLGaussian5x5
+ - CLGaussian5x5HorKernel
+ - CLGaussian5x5VertKernel
+ - CLGaussianPyramid
+ - CLGaussianPyramidHalf
+ - CLGaussianPyramidOrb
+ - CLHarrisCorners
+ - CLHarrisScoreKernel
+ - CLHarrisScoreFP16Kernel
+ - CLHistogram
+ - CLHistogramKernel
+ - CLHOGOrientationBinningKernel
+ - CLHOGBlockNormalizationKernel
+ - CLHOGDetectorKernel
+ - CLHOGNonMaximaSuppressionKernel
+ - CLHOGDescriptor
+ - CLHOGDetector
+ - CLHOGGradient
+ - CLHOGMultiDetection
+ - CLHOGOrientationBinningKernel
+ - CLHOGBlockNormalizationKernel
+ - CLHOGDetectorKernel
+ - CLIntegralImage
+ - CLIntegralImageKernel
+ - CLLaplacianReconstruct
+ - CLLaplacianPyramid
+ - CLMagnitude
+ - CLMagnitudePhaseKernel
+ - CLMedian3x3
+ - CLMedian3x3Kernel
+ - CLMinMaxLocation
+ - CLMinMaxLocationKernel
+ - CLNonLinearFilter
+ - CLNonLinearFilterKernel
+ - CLNonMaximaSuppression3x3
+ - CLNonMaximaSuppression3x3FP16Kernel
+ - CLNonMaximaSuppression3x3Kernel
+ - CLOpticalFlow
+ - CLPhase
+ - CLRemap
+ - CLRemapKernel
+ - CLScharr3x3
+ - CLScharr3x3Kernel
+ - CLSobel3x3
+ - CLSobel3x3Kernel
+ - CLSobel5x5
+ - CLSobel5x5HorKernel
+ - CLSobel5x5VertKernel
+ - CLSobel7x7
+ - CLSobel7x7HorKernel
+ - CLSobel7x7VertKernel
+ - CLThreshold
+ - CLThresholdKernel
+ - CLWarpAffine
+ - CLWarpAffineKernel
+ - CLWarpPerspective
+ - CLWarpPerspectiveKernel
+ - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - NELocallyConnectedLayer
+ - NELocallyConnectedMatrixMultiplyKernel
+ - NEAbsoluteDifference
+ - NEAbsoluteDifferenceKernel
+ - NEAccumulate
+ - NEAccumulateKernel
+ - NEAccumulateSquared
+ - NEAccumulateSquaredKernel
+ - NEAccumulateWeighted
+ - NEAccumulateWeightedKernel
+ - NEAccumulateWeightedFP16Kernel
+ - NEBox3x3
+ - NEBox3x3Kernel
+ - NEBox3x3FP16Kernel
+ - NECannyEdge
+ - NEChannelCombine
+ - NEChannelCombineKernel
+ - NEChannelExtract
+ - NEChannelExtractKernel
+ - NEColorConvert
+ - NEColorConvertKernel
+ - NEConvolution3x3
+ - NEConvolutionRectangle
+ - NEConvolutionRectangleKernel
+ - NEConvolutionSquare
+ - NEConvolutionKernel
+ - NEDerivative
+ - NEDerivativeKernel
+ - NEDilate
+ - NEDilateKernel
+ - NEEqualizeHistogram
+ - NEErode
+ - NEErodeKernel
+ - NEFastCorners
+ - NEFastCornersKernel
+ - NEGaussian3x3
+ - NEGaussian3x3Kernel
+ - NEGaussian5x5
+ - NEGaussian5x5HorKernel
+ - NEGaussian5x5VertKernel
+ - NEGaussianPyramid
+ - NEGaussianPyramidHalf
+ - NEGaussianPyramidOrb
+ - NEHarrisCorners
+ - NEHarrisScoreKernel
+ - NEHarrisScoreFP16Kernel
+ - NEHistogram
+ - NEHistogramKernel
+ - NEHOGOrientationBinningKernel
+ - NEHOGBlockNormalizationKernel
+ - NEHOGDetectorKernel
+ - NEHOGNonMaximaSuppressionKernel
+ - NEHOGDescriptor
+ - NEHOGDetector
+ - NEHOGGradient
+ - NEHOGMultiDetection
+ - NEHOGOrientationBinningKernel
+ - NEHOGBlockNormalizationKernel
+ - NEHOGDetectorKernel
+ - NEIntegralImage
+ - NEIntegralImageKernel
+ - NELaplacianReconstruct
+ - NELaplacianPyramid
+ - NEMagnitude
+ - NEMagnitudePhaseKernel
+ - NEMedian3x3
+ - NEMedian3x3Kernel
+ - NEMinMaxLocation
+ - NEMinMaxLocationKernel
+ - NENonLinearFilter
+ - NENonLinearFilterKernel
+ - NENonMaximaSuppression3x3
+ - NENonMaximaSuppression3x3FP16Kernel
+ - NENonMaximaSuppression3x3Kernel
+ - NEOpticalFlow
+ - NEPhase
+ - NERemap
+ - NERemapKernel
+ - NEScharr3x3
+ - NEScharr3x3Kernel
+ - NESobel3x3
+ - NESobel3x3Kernel
+ - NESobel5x5
+ - NESobel5x5HorKernel
+ - NESobel5x5VertKernel
+ - NESobel7x7
+ - NESobel7x7HorKernel
+ - NESobel7x7VertKernel
+ - NEThreshold
+ - NEThresholdKernel
+ - NEWarpAffine
+ - NEWarpAffineKernel
+ - NEWarpPerspective
+ - NEWarpPerspectiveKernel
+ - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - GCAbsoluteDifference
+ - GCActivationLayer
+ - GCArithmeticAddition
+ - GCBatchNormalizationLayer
+ - GCConcatenateLayer
+ - GCConvolutionLayer
+ - GCDepthwiseConvolutionLayer
+ - GCDirectConvolutionLayer
+ - GCDropoutLayer
+ - GCFillBorder
+ - GCFullyConnectedLayer
+ - GCGEMM
+ - GCGEMMInterleave4x4
+ - GCGEMMTranspose1xW
+ - GCNormalizationLayer
+ - GCNormalizePlanarYUVLayer
+ - GCPixelWiseMultiplication
+ - GCPoolingLayer
+ - GCScale
+ - GCSoftmaxLayer
+ - GCTensorShift
+ - GCTranspose
+
+
+v20.08 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Added new data type QASYMM8_SIGNED support for:
+ - @ref CLArgMinMaxLayer
+ - @ref CLArgMinMaxLayerKernel
+ - Added new data type U8 support for:
+ - @ref NECropKernel
+ - CLCropKernel
+ - Added align_corner support for nearest neighbor interpolation in:
+ - NEScaleKernel
+ - CLScaleKernel
+ - New OpenCL kernels / functions:
+ - @ref CLMaxUnpoolingLayerKernel
+ - New Arm® Neon™ kernels / functions:
+ - NEMaxUnpoolingLayerKernel
+ - New graph example:
+ - graph_yolov3_output_detector
+ - GEMMTuner improvements:
+ - Added fp16 support
+ - Output json files for easier integration
+ - Enabled tuning for export_to_cl_image_rhs option for RHS tensors
+ - More robust script for running benchmarks
+ - Removed padding from:
+ - NEPixelWiseMultiplicationKernel
+ - NEHeightConcatenateLayerKernel
+ - NEThresholdKernel
+ - NEBatchConcatenateLayerKernel
+ - NETransposeKernel
+ - @ref NEBatchNormalizationLayerKernel
+ - NEArithmeticSubtractionKernel
+ - @ref NEBoundingBoxTransformKernel
+ - NELogits1DMaxKernel
+ - NELogits1DSoftmaxKernel
+ - @ref NEROIPoolingLayerKernel
+ - @ref NEROIAlignLayerKernel
+ - NEYOLOLayerKernel
+ - NEUpsampleLayerKernel
+ - NEFloorKernel
+ - NEWidthConcatenateLayerKernel
+ - NEDepthConcatenateLayerKernel
+ - @ref NENormalizationLayerKernel
+ - @ref NEL2NormalizeLayerKernel
+ - NEFillArrayKernel
+ - NEDepthConvertLayerKernel
+ - @ref NERangeKernel
+ - @ref NEPriorBoxLayer
+ - Removed OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ToUint8Scale
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
+ - Removed Arm® Neon™ kernels / functions:
+ - NEGEMMLowpQuantizeDownInt32ToUint8Scale
+ - NEGEMMMatrixAccumulateBiasesKernel
+ - Deprecated functions / interfaces:
+ - Non-descriptor based interfaces for NEThreshold, CLThreshold
+ - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale
+ - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer :
+ The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0.
+ Only axis 0 is supported.
+ The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0.
+ Only axis 0 is supported.
+ - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity.
+ - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only)
+ - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output.
+ - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation.
+ - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding.
+ - Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
+ - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
+ - The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel.
+ - The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer.
+
+v20.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Updated recommended NDK version to r18b.
+ - Updated recommended gcc version to Linaro 6.3.1.
+ - Added Bfloat16 type support
+ - Added Bfloat16 support in:
+ - NEWeightsReshapeKernel
+ - NEConvolutionLayerReshapeWeights
+ - NEIm2ColKernel
+ - NEIm2Col
+ - NEDepthConvertLayerKernel
+ - @ref NEDepthConvertLayer
+ - @ref NEGEMMConvolutionLayer
+ - NEGEMMAssemblyDispatch
+ - Added new data type QASYMM8_SIGNED support for:
+ - @ref CLDirectConvolutionLayer
+ - @ref CLDeconvolutionLayer
+ - @ref CLDirectDeconvolutionLayer
+ - @ref CLGEMMDeconvolutionLayer
+ - CLGEMMLowpMatrixMultiplyReshapedKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+ - @ref CLReductionOperation
+ - @ref CLReduceMean
+ - @ref NEScale
+ - NEScaleKernel
+ - NEUpsampleLayer
+ - @ref NECast
+ - @ref NEReductionOperation
+ - @ref NEReduceMean
+ - @ref NEArgMinMaxLayer
+ - @ref NEDeconvolutionLayer
+ - NEGEMMLowpQuantizeDownInt32ScaleKernel
+ - @ref CPPBoxWithNonMaximaSuppressionLimit
+ - @ref CPPDetectionPostProcessLayer
+ - @ref CPPPermuteKernel
+ - @ref CPPPermute
+ - @ref CPPTopKVKernel
+ - @ref CPPTopKV
+ - @ref CPPUpsample
+ - @ref CPPUpsampleKernel
+ - New OpenCL kernels / functions:
+ - @ref CLQLSTMLayer
+ - @ref CLQLSTMLayerNormalizationKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEQLSTMLayer
+ - @ref NEQLSTMLayerNormalizationKernel
+ - Added HARD_SWISH support in:
+ - CLActivationLayerKernel
+ - NEActivationLayerKernel
+ - Deprecated OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ToUint8Scale
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
+ - Deprecated Arm® Neon™ kernels / functions:
+ - NEGEMMLowpQuantizeDownInt32ToUint8Scale
+ - Removed CPP kernels / functions:
+ - CPPFlipWeightsKernel
+ - Removed PoolingLayerInfo constructors without Data Layout.
+ - Removed CLDepthwiseConvolutionLayer3x3
+ - Removed NEDepthwiseConvolutionLayerOptimized
+ - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
+ - @ref NEWinogradConvolutionLayer
+ - CpuWinogradConv2dTransformInputKernel
+ - CpuWinogradConv2dTransformOutputKernel
+ - CpuWinogradConv2dTransformWeightsKernel
+ - Added CLCompileContext
+ - Added Arm® Neon™ GEMM kernel with 2D window support
+
+v20.02.1 Maintenance release
+ - Added Android-NN build script.
+
+v20.02 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Added new data type QASYMM8_SIGNED support for:
+ - @ref CLDepthwiseConvolutionLayer
+ - CLDepthwiseConvolutionLayer3x3
+ - @ref CLGEMMConvolutionLayer
+ - CLGEMMLowpMatrixMultiplyCore
+ - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLGEMMLowpMatrixMultiplyNativeKernel
+ - @ref NEActivationLayer
+ - NEComparisonOperationKernel
+ - @ref NEConvolutionLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - NEDirectConvolutionLayerOutputStageKernel
+ - @ref NEElementwiseComparison
+ - @ref NEElementwiseMax
+ - @ref NEElementwiseMin
+ - @ref NEElementwiseSquaredDiff
+ - @ref NEFullyConnectedLayer
+ - NEGEMMMatrixVectorMultiplyKernel
+ - @ref NEPixelWiseMultiplication
+ - @ref NEPoolingLayer
+ - @ref NEPReluLayer
+ - Added support for QSYMM8_PER_CHANNEL in:
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - Added support for split sizes in:
+ - @ref CLSplit
+ - @ref NESplit
+ - New OpenCL kernels / functions:
+ - @ref CLFill
+ - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEFill
+ - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+ - Deprecated Arm® Neon™ functions / interfaces:
+ - CLDepthwiseConvolutionLayer3x3
+ - NEDepthwiseConvolutionLayerOptimized
+ - PoolingLayerInfo constructors without Data Layout.
+ - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL.
+ - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
+ - Added the ability to build bootcode for bare metal.
+ - Added support for generating synthetic QASYMM8 graphs.
+ - Added support for F16 datatype in VGG16.
+ - Removed pre-built binaries for GLES.
+
+v19.11.1 Public maintenance release
+ - Fix offset calculation in NEReductionOperationKernel.
+ - Fix data layout in NEScaleKernel for nhwc.
+ - Retain configuration step data layout to avoid side-effects.
+ - Perform sqrt in double domain for L2 pooling.
+ - Fix output shape calculation for Reduce Mean
+ - Restrict cases where optimized NEPadLayer runs.
+
+v19.11 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Updated recommended NDK version to r17c.
+ - Deprecated OpenCL kernels / functions:
+ - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel
+ - CLDepthwiseIm2ColKernel
+ - CLDepthwiseSeparableConvolutionLayer
+ - CLDepthwiseVectorToTensorKernel
+ - CLDirectConvolutionLayerOutputStageKernel
+ - Deprecated Arm® Neon™ kernels / functions:
+ - NEDepthwiseWeightsReshapeKernel
+ - NEDepthwiseIm2ColKernel
+ - NEDepthwiseSeparableConvolutionLayer
+ - NEDepthwiseVectorToTensorKernel
+ - NEDepthwiseConvolutionLayer3x3
+ - New OpenCL kernels / functions:
+ - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer
+ - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
+ OpenCL kernels / functions)
+ - @ref CLLogSoftmaxLayer
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
+ - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
+ - @ref NEDetectionPostProcessLayer
+ - @ref NEGenerateProposalsLayer
+ - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer
+ - @ref NELogSoftmaxLayer
+ - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer
+ - Added QASYMM8 support for:
+ - @ref CLGenerateProposalsLayer
+ - @ref CLROIAlignLayer
+ - @ref CPPBoxWithNonMaximaSuppressionLimit
+ - Added QASYMM16 support for:
+ - @ref CLBoundingBoxTransform
+ - Added FP16 support for:
+ - CLGEMMMatrixMultiplyReshapedKernel
+ - Added new data type QASYMM8_PER_CHANNEL support for:
+ - CLDequantizationLayer
+ - @ref NEDequantizationLayer
+ - Added new data type QSYMM8_PER_CHANNEL support for:
+ - @ref CLConvolutionLayer
+ - @ref NEConvolutionLayer
+ - @ref CLDepthwiseConvolutionLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - Added FP16 mixed-precision support for:
+ - CLGEMMMatrixMultiplyReshapedKernel
+ - CLPoolingLayerKernel
+ - Added FP32 and FP16 ELU activation for:
+ - @ref CLActivationLayer
+ - @ref NEActivationLayer
+ - Added asymmetric padding support for:
+ - @ref CLDirectDeconvolutionLayer
+ - @ref CLGEMMDeconvolutionLayer
+ - @ref NEDeconvolutionLayer
+ - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer.
+ - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer.
+ - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
+ - Improved performance for CL Inception V3 - FP16.
+ - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
+ - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
+ - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance.
+ - Optimized @ref CLPadLayer.
+ - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
+ - Reduced memory consumption by implementing weights sharing.
+
+v19.08.1 Public maintenance release
+ - Fix offset calculation in NEReductionOperationKernel.
+ - Fix data layout in NEScaleKernel for nhwc.
+ - Retain configuration step data layout to avoid side-effects.
+ - Perform sqrt in double domain for L2 pooling.
+ - Fix output shape calculation for Reduce Mean
+ - Fix broadcast CLPixelwiseMultiplication with 5D tensors
+
+v19.08 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Deprecated Arm® Neon™ functions
+ - NEDepthConcatenateLayer
+ - NEWidthConcatenateLayer
+ - Deprecated OpenCL kernels / functions
+ - CLDepthConcatenateLayer
+ - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
+ - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
+ - CLWidthConcatenateLayer
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEAbsLayer
+ - @ref NECast
+ - @ref NEElementwisePower
+ - @ref NELogLayer
+ - @ref NELSTMLayerQuantized
+ - @ref NENegLayer
+ - @ref NEPReluLayer
+ - @ref NESinLayer
+ - NEBatchConcatenateLayerKernel
+ - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer
+ - NEDepthwiseConvolutionLayerNativeKernel
+ - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer
+ - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer
+ - New OpenCL kernels / functions:
+ - @ref CLAbsLayer
+ - @ref CLElementwisePower
+ - @ref CLLogLayer
+ - @ref CLLSTMLayerQuantized
+ - @ref CLNegLayer
+ - @ref CLPReluLayer
+ - @ref CLSinLayer
+ - CLBatchConcatenateLayerKernel
+ - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer
+ - CLGEMMLowpMatrixMultiplyNativeKernel
+ - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - CLGEMMMatrixMultiplyNativeKernel
+ - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer
+ - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer
+ - New examples:
+ - neon_opticalflow
+ - cl_cache
+ - neon_permute
+ - Added support for FP16 in @ref NEDeconvolutionLayer
+ - Added support for FP16 in @ref CLDeconvolutionLayer
+ - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
+ - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
+ - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
+ - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases
+ - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon™ only)
+ - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
+ - Altered @ref QuantizationInfo interface to support per-channel quantization.
+ - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations.
+ - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
+ - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
+ - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
+ - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
+
+v19.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer
+ - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication
+ - @ref NECropKernel / @ref NECropResize
+ - NEDepthwiseConvolutionAssemblyDispatch
+ - @ref NEFFTDigitReverseKernel
+ - @ref NEFFTRadixStageKernel
+ - @ref NEFFTScaleKernel
+ - NEGEMMLowpOffsetContributionOutputStageKernel
+ - NEHeightConcatenateLayerKernel
+ - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer
+ - @ref NEFFT1D
+ - @ref NEFFT2D
+ - @ref NEFFTConvolutionLayer
+ - New OpenCL kernels / functions:
+ - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication
+ - CLCropKernel / @ref CLCropResize
+ - @ref CLDeconvolutionReshapeOutputKernel
+ - @ref CLFFTDigitReverseKernel
+ - @ref CLFFTRadixStageKernel
+ - @ref CLFFTScaleKernel
+ - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
+ - CLHeightConcatenateLayerKernel
+ - @ref CLDirectDeconvolutionLayer
+ - @ref CLFFT1D
+ - @ref CLFFT2D
+ - @ref CLFFTConvolutionLayer
+ - @ref CLGEMMDeconvolutionLayer
+ - New OpenGLES kernels / functions:
+ - GCConcatenateLayer
+ - Deprecated functions/interfaces
+ - GCDepthConcatenateLayer
+ - NEWidthConcatenateLayer
+ - NEDepthConcatenateLayer
+ - CLWidthConcatenateLayer
+ - CLDepthConcatenateLayer
+ - CLGEMMInterleave4x4
+ - CLGEMMTranspose1xW
+ - Support different quantization info in CLConcatLayer.
+ - Add checks on different input/output quantization info were not supported.
+ - Tensors have different quantization information.
+ - Add FP16 support checks.
+ - Fix output quantization CLDeptwiseConv3x3 when activation is fused.
+ - New graph examples:
+ - graph_convolution
+ - graph_fully_connected
+ - graph_depthwise_convolution
+ - Deepspeech v0.4.1
+ - Add support for QASYMM8 in NEArithmeticSubtractionKernel.
+ - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
+ - Add support for QASYMM8 NEDeconvolution.
+ - Add support for DequantizationLayer for Neon/CL.
+ - Add support for dilation in CLDepthwiseConvolution.
+ - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
+ - Optimize CLDeconvolution.
+ - Add StackLayer to the graph API.
+ - Add support for "reflect" padding mode in NEPad.
+ - Winograd 7x7 NHWC on OpenCL.
+ - Rework CL ML layers to run exclusively on CL.
+ - Support different quantization info in PoolingLayer.
+ - Implement and test import memory interfaces.
+ - Added new tests and removed old ones.
+ - Various clang-tidy fixes.
+
+v19.02 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - New Arm® Neon™ kernels / functions:
+ - @ref NETileKernel / @ref NETile
+ - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization
+ - NEElementwiseOperationKernel
+ - @ref NEElementwiseMax
+ - @ref NEElementwiseMin
+ - @ref NEElementwiseSquaredDiff
+ - @ref NESelectKernel / @ref NESelect
+ - @ref NESplit
+ - @ref NESlice
+ - @ref NEUnstack
+ - @ref NEStridedSliceKernel / @ref NEStridedSlice
+ - NEElementwiseUnaryKernel
+ - @ref NERsqrtLayer
+ - @ref NEExpLayer
+ - @ref NEReverseKernel / @ref NEReverse
+ - @ref NEArgMinMaxLayer
+ - @ref NEStackLayerKernel / @ref NEStackLayer
+ - @ref NERangeKernel / @ref NERange
+ - @ref NEPadLayer
+ - NEMemsetKernel
+ - @ref NEGatherKernel / @ref NEGather
+ - @ref NEElementwiseComparison
+ - @ref NEElementwiseComparisonStatic
+ - NEComparisonOperationKernel
+ - @ref NEElementwiseDivision
+ - New OpenCL kernels / functions:
+ - @ref CLSelectKernel / @ref CLSelect
+ - @ref CLTileKernel / @ref CLTile
+ - @ref CLComparisonKernel / @ref CLComparison
+ - @ref CLArgMinMaxLayer
+ - @ref CLElementwiseMax
+ - @ref CLElementwiseMin
+ - @ref CLElementwiseSquaredDiff
+ - @ref CLStackLayerKernel / @ref CLStackLayer
+ - @ref CLReverse / @ref CLReverseKernel
+ - @ref CLRsqrtLayer
+ - @ref CLExpLayer
+ - CLElementWiseUnaryLayerKernel
+ - CLGEMMReshapeLHSMatrixKernel
+ - CLGEMMReshapeRHSMatrixKernel
+ - CLGEMMMatrixMultiplyReshapedKernel
+ - @ref CLRangeKernel / @ref CLRange
+ - @ref CLUnstack
+ - @ref CLGatherKernel / @ref CLGather
+ - CLGEMMLowpMatrixMultiplyReshapedKernel
+ - New CPP kernels / functions:
+ - @ref CPPDetectionOutputLayer
+ - @ref CPPTopKV / @ref CPPTopKVKernel
+ - Added new examples:
+ - graph_ssd_mobilenet.cpp
+ - graph_mobilenet_v2.cpp
+ - graph_resnet12.cpp
+ - graph_srcnn955.cpp
+ - graph_vgg_vdsr.cpp
+ - graph_inception_resnet_v1.cpp
+ - Add 4D tensors support to
+ - @ref NESoftmaxLayer
+ - Fused activation in @ref CLWinogradConvolutionLayer
+ - Extended @ref NEPermute to support more cases
+ - Added Neon™/SVE GEMM Hybrid kernels
+ - Added u8 and s8 hybrid assembly kernels
+ - Introduced GEMM strategy name in NEGEMMAssemblyWrapper
+ - Improved @ref CLTuner
+ - Fused the bias addition within @ref CLGEMM
+ - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer
+ - Added NHWC data layout support to:
+ - @ref NEScale for F16
+ - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16
+ - @ref NEL2NormalizeLayer for FP32/FP16
+ - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16
+ - @ref CLROIAlignLayer
+ - @ref CLGenerateProposalsLayer
+ - Added QASYMM8 support to the following kernels:
+ - NEArithmeticAdditionKernel
+ - @ref NEScale
+ - Added new tests and improved validation and benchmarking suites.
+ - Deprecated functions/interfaces
+ - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer
+
+v18.11 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel
+ - @ref NEReduceMean
+ - @ref NEReorgLayer / @ref NEReorgLayerKernel
+ - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel
+ - NEUpsampleLayer / NEUpsampleLayerKernel
+ - NEYOLOLayer / NEYOLOLayerKernel
+ - New OpenCL kernels / functions:
+ - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel
+ - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel
+ - @ref CLComputeAllAnchorsKernel
+ - @ref CLGenerateProposalsLayer
+ - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel
+ - @ref CLReorgLayer / @ref CLReorgLayerKernel
+ - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel
+ - @ref CLPadLayer
+ - @ref CLReduceMean
+ - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel
+ - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel
+ - @ref CLSlice
+ - @ref CLSplit
+ - @ref CLStridedSlice / @ref CLStridedSliceKernel
+ - CLUpsampleLayer / CLUpsampleLayerKernel
+ - CLYOLOLayer / CLYOLOLayerKernel
+ - New CPP kernels / functions:
+ - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel
+ - Added the validate method in:
+ - @ref NEDepthConvertLayer
+ - @ref NEFloor / @ref CLFloor
+ - NEGEMMMatrixAdditionKernel
+ - @ref NEReshapeLayer / @ref CLReshapeLayer
+ - @ref CLScale
+ - Added new examples:
+ - graph_shufflenet.cpp
+ - graph_yolov3.cpp
+ - Added documentation for add a new function or kernel.
+ - Improved doxygen documentation adding a list of the existing functions.
+ - Add 4D tensors support to
+ - CLWidthConcatenateLayer
+ - CLFlattenLayer
+ - @ref CLSoftmaxLayer
+ - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
+ - Add SVE support
+ - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization
+ - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer
+ - Added NHWC data layout support to:
+ - @ref CLChannelShuffleLayer
+ - @ref CLDeconvolutionLayer
+ - @ref CLL2NormalizeLayer
+ - Added QASYMM8 support to the following kernels:
+ - CLScaleKernel
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - CLPixelWiseMultiplicationKernel
+ - Added FP16 support to the following kernels:
+ - CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - @ref CLNormalizePlanarYUVLayerKernel
+ - @ref CLWinogradConvolutionLayer (5x5 kernel)
+ - More tests added to both validation and benchmarking suites.
+
+v18.08 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Updated recommended NDK version to r17b.
+ - Removed support for QS8/QS16 data types.
+ - Added support for grouped convolution in @ref CLConvolutionLayer.
+ - Added NHWC data layout support to:
+ - NEDepthConcatenateLayer / CLDepthConcatenateLayer
+ - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer
+ - @ref CLDepthwiseConvolutionLayer
+ - @ref CLDirectConvolutionLayer
+ - @ref CLConvolutionLayer
+ - @ref CLScale
+ - CLIm2ColKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NERNNLayer
+ - New OpenCL kernels / functions:
+ - @ref CLArithmeticDivision
+ - Introduced prepare() stage support in the graph API for GLES.
+ - Added support for memory reusage when trying to allocate smaller CLTensors.
+ - Enabled NHWC execution on graph examples.
+ - Added JPEG accessor for validation purposes.
+ - Added validate methods to some kernels / functions.
+
+v18.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Major redesign in the interface for the Neon™ kernels implemented in assembly.
+ - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
+ - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in Neon™ functions.
+ - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
+ - Moved Neon™ assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
+ - Improved doxygen documentation.
+ - Improved memory management for layer's transitions.
+ - Added support for NHWC data layout in tensors.
+ - Added NHWC data layout support to:
+ - @ref NEGEMMConvolutionLayer
+ - @ref NEDirectConvolutionLayer
+ - @ref NEPoolingLayer / @ref CLPoolingLayer
+ - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - @ref NEScale
+ - NEIm2Col
+ - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer.
+ - New OpenCL kernels / functions:
+ - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel
+ - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights
+ - @ref CLCopy / CLCopyKernel
+ - @ref CLLSTMLayer
+ - @ref CLRNNLayer
+ - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel
+ - CLWinogradFilterTransformKernel / @ref CLWinogradConvolutionLayer
+ - CLWinogradInputTransformKernel / CLWinogradInputTransform
+ - New Arm® Neon™ kernels / functions:
+ - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights.
+ - Created the validate method in @ref CLDepthwiseConvolutionLayer.
+ - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer.
+ - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer.
+ - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel.
+ - Port mobilenet example to NHWC data layout.
+ - Enabled Winograd method in @ref CLConvolutionLayer.
+ - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer.
+ - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
+ - Added memory manager support in GLES functions.
+ - Major refactoring of the graph API.
+ - Added GLES backend in the graph API.
+ - Added support for the memory manager in the graph API.
+ - Enabled Winograd Convolution method in the graph API.
+ - Added support for grouped convolutions in the graph API.
+ - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer.
+ - Added fast maths flag in @ref CLConvolutionLayer.
+ - Added new tests and benchmarks in validation and benchmark frameworks
+ - Merge Activation layer with Convolution Layer (Neon™, CL, GLES)
+ - Added support to OpenCL 2.0 SVM
+ - Added support to import memory in OpenCL tensors.
+ - Added the prepare() method to perform any one off pre-processing before running the function.
+ - Added new examples:
+ - graph_inception_v4.cpp
+ - graph_resnext50.cpp
+ - Added memory measurement instrument for CL.
+
+v18.03 Public maintenance release
+ - Various bug fixes.
+ - Fixed bug in @ref NEActivationLayer
+ - Fix in @ref CLTuner when using batches.
+ - Updated recommended NDK version to r16b (And fixed warnings).
+ - Fixed bug in validation code.
+ - Added Inception v4 graph example.
+ - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
+
+v18.02 Public major release
+ - Various Arm® Neon™ / OpenCL / GLES optimisations.
+ - Various bug fixes.
+ - Changed default number of threads on big LITTLE systems.
+ - Refactored examples and added:
+ - graph_mobilenet_qassym8
+ - graph_resnet
+ - graph_squeezenet_v1_1
+ - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method.
+ - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method.
+ - Added in place support to:
+ - @ref CLActivationLayer
+ - @ref CLBatchNormalizationLayer
+ - Added QASYMM8 support to:
+ - @ref CLActivationLayer
+ - @ref CLDepthwiseConvolutionLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - @ref NESoftmaxLayer
+ - Added FP16 support to:
+ - CLDepthwiseConvolutionLayer3x3
+ - @ref CLDepthwiseConvolutionLayer
+ - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication
+ - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer
+ - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
+ - New OpenCL kernels / functions:
+ - CLDirectConvolutionLayerOutputStageKernel
+ - New Arm® Neon™ kernels / functions
+ - Added name() method to all kernels.
+ - Added support for Winograd 5x5.
+ - NEPermuteKernel / @ref NEPermute
+ - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer
+ - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer
+ - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer
+ - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel
+ - New GLES kernels / functions:
+ - GCTensorShiftKernel / GCTensorShift
+
+v18.01 Public maintenance release
+ - Various bug fixes
+ - Added some of the missing validate() methods
+ - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample
+ - Added CLPermuteKernel / @ref CLPermute
+ - Added method to clean the programs cache in the CL Kernel library.
+ - Added GCArithmeticAdditionKernel / GCArithmeticAddition
+ - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
+ - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
+ - Added GCScaleKernel / GCScale
+ - Added GCWeightsReshapeKernel / GCConvolutionLayer
+ - Added FP16 support to the following GLES compute kernels:
+ - GCCol2ImKernel
+ - GCGEMMInterleave4x4Kernel
+ - GCGEMMTranspose1xWKernel
+ - GCIm2ColKernel
+ - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
+ - Added NEDirectConvolutionLayerOutputStageKernel
+ - Added QASYMM8 support to the following Arm® Neon™ kernels:
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - @ref NEFillBorderKernel
+ - NEPoolingLayerKernel
+ - Added new examples:
+ - graph_cl_mobilenet_qasymm8.cpp
+ - graph_inception_v3.cpp
+ - gc_dc.cpp
+ - More tests added to both validation and benchmarking suites.
+
+v17.12 Public major release
+ - Most machine learning functions on OpenCL support the new data type QASYMM8
+ - Introduced logging interface
+ - Introduced opencl timer
+ - Reworked GEMMLowp interface
+ - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
+ - Added validation method for most Machine Learning kernels / functions
+ - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
+ - Added sgemm example for OpenCL
+ - Added absolute difference example for GLES compute
+ - Added new tests and benchmarks in validation and benchmark frameworks
+ - Added new kernels / functions for GLES compute
+
+ - New OpenGL ES kernels / functions
+ - GCAbsoluteDifferenceKernel / GCAbsoluteDifference
+ - GCActivationLayerKernel / GCActivationLayer
+ - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
+ - GCCol2ImKernel
+ - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
+ - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
+ - GCDropoutLayerKernel / GCDropoutLayer
+ - GCFillBorderKernel / GCFillBorder
+ - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
+ - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
+ - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
+ - GCIm2ColKernel
+ - GCNormalizationLayerKernel / GCNormalizationLayer
+ - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
+ - GCPoolingLayerKernel / GCPoolingLayer
+ - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
+ - GCTransposeKernel / GCTranspose
+
+ - New Arm® Neon™ kernels / functions
+ - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
+ - arm_compute::NEHGEMMAArch64FP16Kernel
+ - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
+ - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore
+ - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+ - NEWinogradLayer / NEWinogradLayerKernel
+
+ - New OpenCL kernels / functions
+ - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+
+ - New graph nodes for Arm® Neon™ and OpenCL
+ - graph::BranchLayer
+ - graph::DepthConvertLayer
+ - graph::DepthwiseConvolutionLayer
+ - graph::DequantizationLayer
+ - graph::FlattenLayer
+ - graph::QuantizationLayer
+ - graph::ReshapeLayer
+
+v17.10 Public maintenance release
+ - Bug fixes:
+ - Check the maximum local workgroup size supported by OpenCL devices
+ - Minor documentation updates (Fixed instructions to build the examples)
+ - Introduced a graph::GraphContext
+ - Added a few new Graph nodes, support for branches and grouping.
+ - Automatically enable cl_printf in debug builds
+ - Fixed bare metal builds for armv7a
+ - Added AlexNet and cartoon effect examples
+ - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)
+
+v17.09 Public major release
+ - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
+ - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
+ - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
+ - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL.
+ - New Arm® Neon™ kernels / functions:
+ - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
+ - NEDequantizationLayerKernel / @ref NEDequantizationLayer
+ - NEFloorKernel / @ref NEFloor
+ - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer
+ - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer
+ - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer
+ - @ref NEReductionOperationKernel / @ref NEReductionOperation
+ - NEReshapeLayerKernel / @ref NEReshapeLayer
+
+ - New OpenCL kernels / functions:
+ - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer
+ - CLDequantizationLayerKernel / CLDequantizationLayer
+ - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer
+ - CLFlattenLayer
+ - CLFloorKernel / @ref CLFloor
+ - CLGEMMTranspose1xW
+ - CLGEMMMatrixVectorMultiplyKernel
+ - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer
+ - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer
+ - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer
+ - @ref CLReductionOperationKernel / @ref CLReductionOperation
+ - CLReshapeLayerKernel / @ref CLReshapeLayer
+
+v17.06 Public major release
+ - Various bug fixes
+ - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
+ - Added unit tests and benchmarks (AlexNet, LeNet)
+ - Added support for sub tensors.
+ - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
+ - Added @ref OMPScheduler (OpenMP) scheduler for Neon
+ - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
+ - User can specify their own scheduler by implementing the @ref IScheduler interface.
+ - New OpenCL kernels / functions:
+ - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
+ - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer
+ - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection
+ - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer
+ - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights
+ - New C++ kernels:
+ - CPPDetectionWindowNonMaximaSuppressionKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
+ - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
+ - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
+ - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer
+ - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights
+
+v17.05 Public bug fixes release
+ - Various bug fixes
+ - Remaining of the functions ported to use accurate padding.
+ - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
+ - Added "free" method to allocator.
+ - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9
+
+v17.04 Public bug fixes release
+
+ The following functions have been ported to use the new accurate padding:
+ - CLColorConvertKernel
+ - CLEdgeNonMaxSuppressionKernel
+ - CLEdgeTraceKernel
+ - CLGaussianPyramidHorKernel
+ - CLGaussianPyramidVertKernel
+ - CLGradientKernel
+ - NEChannelCombineKernel
+ - NEFillArrayKernel
+ - NEGaussianPyramidHorKernel
+ - NEGaussianPyramidVertKernel
+ - NEHarrisScoreFP16Kernel
+ - NEHarrisScoreKernel
+ - NEHOGDetectorKernel
+ - NELogits1DMaxKernel
+ - NELogits1DShiftExpSumKernel
+ - NELogits1DNormKernel
+ - NENonMaximaSuppression3x3FP16Kernel
+ - NENonMaximaSuppression3x3Kernel
+
+v17.03.1 First Major public release of the sources
+ - Renamed the library to arm_compute
+ - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
+ - New padding calculation interface introduced and ported most kernels / functions to use it.
+ - New OpenCL kernels / functions:
+ - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
+ - New Arm® Neon™ kernels / functions:
+ - @ref NENormalizationLayerKernel / @ref NENormalizationLayer
+ - NETransposeKernel / @ref NETranspose
+ - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
+ - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer
+ - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer
+ - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp
+
+v17.03 Sources preview
+ - New OpenCL kernels / functions:
+ - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge
+ - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM
+ - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer
+ - CLTransposeKernel / @ref CLTranspose
+ - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow
+ - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
+ - CLLaplacianPyramid, CLLaplacianReconstruct
+ - New Arm® Neon™ kernels / functions:
+ - NEActivationLayerKernel / @ref NEActivationLayer
+ - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM
+ - NEPoolingLayerKernel / @ref NEPoolingLayer
+
+v17.02.1 Sources preview
+ - New OpenCL kernels / functions:
+ - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer
+ - CLPoolingLayerKernel / @ref CLPoolingLayer
+ - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
+ - CLRemapKernel / CLRemap
+ - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
+ - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
+ - CLNonLinearFilterKernel / CLNonLinearFilter
+ - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
+ - NEAccumulateWeightedFP16Kernel
+ - NEBox3x3FP16Kernel
+ - NENonMaximaSuppression3x3FP16Kernel
+
+v17.02 Sources preview
+ - New OpenCL kernels / functions:
+ - CLActivationLayerKernel / @ref CLActivationLayer
+ - CLChannelCombineKernel / CLChannelCombine
+ - CLDerivativeKernel / CLChannelExtract
+ - CLFastCornersKernel / CLFastCorners
+ - CLMeanStdDevKernel / CLMeanStdDev
+ - New Arm® Neon™ kernels / functions:
+ - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
+ - NENonLinearFilterKernel / NENonLinearFilter
+ - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
+ - Switched all the kernels / functions to use tensors instead of images.
+ - Updated documentation to include instructions to build the library from sources.
+
+v16.12 Binary preview release
+ - Original release
+
+ */
+} // namespace arm_compute