aboutsummaryrefslogtreecommitdiff
path: root/docs/user_guide/release_version_and_change_log.dox
diff options
context:
space:
mode:
Diffstat (limited to 'docs/user_guide/release_version_and_change_log.dox')
-rw-r--r--docs/user_guide/release_version_and_change_log.dox1389
1 files changed, 1389 insertions, 0 deletions
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox
new file mode 100644
index 0000000000..b9e3b37263
--- /dev/null
+++ b/docs/user_guide/release_version_and_change_log.dox
@@ -0,0 +1,1389 @@
+///
+/// Copyright (c) 2017-2021 Arm Limited.
+///
+/// SPDX-License-Identifier: MIT
+///
+/// Permission is hereby granted, free of charge, to any person obtaining a copy
+/// of this software and associated documentation files (the "Software"), to
+/// deal in the Software without restriction, including without limitation the
+/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+/// sell copies of the Software, and to permit persons to whom the Software is
+/// furnished to do so, subject to the following conditions:
+///
+/// The above copyright notice and this permission notice shall be included in all
+/// copies or substantial portions of the Software.
+///
+/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+/// SOFTWARE.
+///
+namespace arm_compute
+{
+/** @page versions_changelogs Release Versions and Changelog
+
+@tableofcontents
+
+@section S2_1_versions Release versions
+
+All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number.
+If there is more than one release in a month then an extra sequential number is appended at the end:
+
+ v17.03 (First release of March 2017)
+ v17.03.1 (Second release of March 2017)
+ v17.04 (First release of April 2017)
+
+@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
+
+@section S2_2_changelog Changelog
+
+v21.05 Public major release
+ - Removed computer vision support from Arm® Neon™ backend
+ - Removed the following functions:
+ - NEAbsoluteDifference
+ - NEAccumulate
+ - NEBox3x3
+ - NECannyEdge
+ - NEChannelCombine
+ - NEChannelExtract
+ - NEColorConvert
+ - NEConvolution
+ - NEDerivative
+ - NEDilate
+ - NEEqualizeHistogram
+ - NEErode
+ - NEFastCorners
+ - NEGaussian3x3
+ - NEGaussian5x5
+ - NEGaussianPyramid
+ - NEHOGDescriptor
+ - NEHOGDetector
+ - NEHOGGradient
+ - NEHOGMultiDetection
+ - NEHarrisCorners
+ - NEHistogram
+ - NEIntegralImage
+ - NELaplacianPyramid
+ - NELaplacianReconstruct
+ - NEMagnitude
+ - NEMeanStdDev
+ - NEMedian3x3
+ - NEMinMaxLocation
+ - NENonLinearFilter
+ - NEOpticalFlow
+ - NEPhase
+ - NEScharr3x3
+ - NESobel3x3
+ - NESobel5x5
+ - NESobel7x7
+ - NETableLookup
+ - NEThreshold
+ - NEWarpAffine
+ - NEWarpPerspectiveKernel
+
+ - Remove all GLES kernels / functions / tests / examples
+ - Removed computer vision support from CL backend
+ - Removed the following functions:
+ - CLAbsoluteDifference
+ - CLAccumulate
+ - CLBox3x3
+ - CLCannyEdge
+ - CLChannelCombine
+ - CLChannelExtract
+ - CLColorConvert
+ - CLConvolution
+ - CLDerivative
+ - CLDilate
+ - CLEqualizeHistogram
+ - CLErode
+ - CLFastCorners
+ - CLGaussian3x3
+ - CLGaussian5x5
+ - CLGaussianPyramid
+ - CLHOGDescriptor
+ - CLHOGDetector
+ - CLHOGGradient
+ - CLHOGMultiDetection
+ - CLHarrisCorners
+ - CLHistogram
+ - CLIntegralImage
+ - CLLaplacianPyramid
+ - CLLaplacianReconstruct
+ - CLMagnitude
+ - CLMeanStdDev
+ - CLMedian3x3
+ - CLMinMaxLocation
+ - CLNonLinearFilter
+ - CLOpticalFlow
+ - CLPhase
+ - CLScharr3x3
+ - CLSobel3x3
+ - CLSobel5x5
+ - CLSobel7x7
+ - CLTableLookup
+ - CLThreshold
+ - CLWarpAffine
+ - CLWarpPerspective
+
+v21.02 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Upgrade C++ standard to C++14
+ - Add macOS support
+ - Add Armv8-R AArch64 architecture support
+ - Add SVE/SVE2 support for:
+ - NEScaleKernel
+ - @ref NEActivationLayer
+ - @ref NEArithmeticAddition
+ - @ref NEBatchNormalizationLayerKernel
+ - @ref cpu::kernels::CpuLogits1DSoftmaxKernel
+ - @ref cpu::kernels::CpuLogits1DMaxKernel
+ - @ref cpu::kernels::CpuElementwiseUnaryKernel
+ - Remove padding from OpenCL kernels:
+ - CLDirectConvolutionLayerKernel
+ - @ref CLArgMinMaxLayerKernel
+ - @ref CLPadLayerKernel
+ - @ref CLROIAlignLayerKernel
+ - @ref CLRangeKernel
+ - CLScaleKernel
+ - @ref CLSelectKernel
+ - @ref CLBitwiseKernel
+ - @ref opencl::kernels::ClFloorKernel
+ - CLTransposeKernel
+ - Deprecate functions in CLTuner:
+ - add_lws_to_table
+ - import_lws_table
+ - lws_table
+ - Remove functions:
+ - NELocallyConnectedLayer / CLLocallyConnectedLayer
+ - NEIm2Col
+ - NECol2Im
+ - NEGEMMInterleave4x4
+ - NEGEMMTranspose1xW
+ - NEComputeAllAnchors / CLComputeAllAnchors
+ - NEGEMMAssemblyDispatch
+ - NEUpsampleLayer / CLUpsampleLayer
+ - Remove kernels:
+ - NEGEMMMatrixVectorMultiplyKernel
+ - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel
+ - NEUpsampleLayerKernel / CLUpsampleLayerKernel
+ - Extend OpenCL tuner with workgroup batch size support
+ - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units
+ - Add functionality to load the OpenCL GEMM heuristics at runtime
+ - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL
+ - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation
+ - Note: data-type decoupling is in progress and expiremental. Warning of unused symbols might be raised
+
+v20.11 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type.
+ This is planned to be resolved in 21.02 release.
+ - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer.
+ - Added new data type S32 support for:
+ - NEArithmeticSubtraction
+ - NEArithmeticSubtractionKernel
+ - @ref NEPixelWiseMultiplication
+ - NEPixelWiseMultiplicationKernel
+ - NEElementwiseDivision
+ - NEDivisionOperationKernel
+ - Interface change
+ - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension
+ on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5.
+ The supported value range of axis is [-rank, rank).
+ This change applies to the following functions:
+ - @ref NESoftmaxLayer
+ - @ref NELogSoftmaxLayer
+ - @ref CLSoftmaxLayer
+ - @ref CLLogSoftmaxLayer
+ - GCSoftmaxLayer
+ - New OpenCL kernels / functions:
+ - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+ - @ref CLLogicalNot
+ - @ref CLLogicalAnd
+ - @ref CLLogicalOr
+ - New Arm® Neon™ kernels / functions:
+ - @ref NELogicalNot
+ - @ref NELogicalAnd
+ - @ref NELogicalOr
+ - Removed padding from Arm® Neon™ kernels:
+ - NEComplexPixelWiseMultiplicationKernel
+ - NENonMaximaSuppression3x3Kernel
+ - @ref NERemapKernel
+ - @ref NEGEMMInterleave4x4Kernel
+ - NEDirectConvolutionLayerKernel
+ - NEScaleKernel
+ - NELocallyConnectedMatrixMultiplyKernel
+ - @ref NEGEMMLowpOffsetContributionKernel
+ - @ref NEGEMMTranspose1xWKernel
+ - NEPoolingLayerKernel
+ - NEConvolutionKernel
+ - NEDepthwiseConvolutionLayerNativeKernel
+ - @ref NEGEMMLowpMatrixMultiplyKernel
+ - @ref NEGEMMMatrixMultiplyKernel
+ - NEDirectConvolutionLayerOutputStageKernel
+ - @ref NEReductionOperationKernel
+ - @ref NEGEMMLowpMatrixAReductionKernel
+ - @ref NEGEMMLowpMatrixBReductionKernel
+ - Removed padding from OpenCL kernels:
+ - CLBatchConcatenateLayerKernel
+ - CLElementwiseOperationKernel
+ - @ref CLBatchNormalizationLayerKernel
+ - CLPoolingLayerKernel
+ - @ref CLWinogradInputTransformKernel
+ - @ref CLGEMMLowpMatrixMultiplyNativeKernel
+ - @ref CLGEMMLowpMatrixAReductionKernel
+ - @ref CLGEMMLowpMatrixBReductionKernel
+ - @ref CLGEMMLowpOffsetContributionOutputStageKernel
+ - @ref CLGEMMLowpOffsetContributionKernel
+ - @ref CLWinogradOutputTransformKernel
+ - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+ - @ref CLFuseBatchNormalizationKernel
+ - @ref CLDepthwiseConvolutionLayerNativeKernel
+ - @ref CLDepthConvertLayerKernel
+ - CLCopyKernel
+ - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - CLActivationLayerKernel
+ - @ref CLWinogradFilterTransformKernel
+ - CLWidthConcatenateLayerKernel
+ - CLWidthConcatenate4TensorsKernel
+ - CLWidthConcatenate2TensorsKernel
+ - CLLogits1DMaxShiftExpSumKernel
+ - CLLogits1DNormKernel
+ - CLHeightConcatenateLayerKernel
+ - @ref CLGEMMMatrixMultiplyKernel
+ - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel
+ - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+ - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLDepthConcatenateLayerKernel
+ - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+ - Removed OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
+ - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - CLLocallyConnectedLayer
+ - CLLocallyConnectedMatrixMultiplyKernel
+ - CLAbsoluteDifference
+ - CLAbsoluteDifferenceKernel
+ - CLAccumulate
+ - CLAccumulateKernel
+ - CLAccumulateSquared
+ - CLAccumulateSquaredKernel
+ - CLAccumulateWeighted
+ - CLAccumulateWeightedKernel
+ - CLAccumulateWeightedFP16Kernel
+ - CLBox3x3
+ - CLBox3x3Kernel
+ - CLBox3x3FP16Kernel
+ - CLCannyEdge
+ - CLChannelCombine
+ - CLChannelCombineKernel
+ - CLChannelExtract
+ - CLChannelExtractKernel
+ - CLColorConvert
+ - CLColorConvertKernel
+ - CLConvolution3x3
+ - CLConvolutionRectangle
+ - CLConvolutionRectangleKernel
+ - CLConvolutionSquare
+ - CLConvolutionKernel
+ - CLDerivative
+ - CLDerivativeKernel
+ - CLDilate
+ - CLDilateKernel
+ - CLEqualizeHistogram
+ - CLErode
+ - CLErodeKernel
+ - CLFastCorners
+ - CLFastCornersKernel
+ - CLGaussian3x3
+ - CLGaussian3x3Kernel
+ - CLGaussian5x5
+ - CLGaussian5x5HorKernel
+ - CLGaussian5x5VertKernel
+ - CLGaussianPyramid
+ - CLGaussianPyramidHalf
+ - CLGaussianPyramidOrb
+ - CLHarrisCorners
+ - CLHarrisScoreKernel
+ - CLHarrisScoreFP16Kernel
+ - CLHistogram
+ - CLHistogramKernel
+ - CLHOGOrientationBinningKernel
+ - CLHOGBlockNormalizationKernel
+ - CLHOGDetectorKernel
+ - CLHOGNonMaximaSuppressionKernel
+ - CLHOGDescriptor
+ - CLHOGDetector
+ - CLHOGGradient
+ - CLHOGMultiDetection
+ - CLHOGOrientationBinningKernel
+ - CLHOGBlockNormalizationKernel
+ - CLHOGDetectorKernel
+ - CLIntegralImage
+ - CLIntegralImageKernel
+ - CLLaplacianReconstruct
+ - CLLaplacianPyramid
+ - CLMagnitude
+ - CLMagnitudePhaseKernel
+ - CLMedian3x3
+ - CLMedian3x3Kernel
+ - CLMinMaxLocation
+ - CLMinMaxLocationKernel
+ - CLNonLinearFilter
+ - CLNonLinearFilterKernel
+ - CLNonMaximaSuppression3x3
+ - CLNonMaximaSuppression3x3FP16Kernel
+ - CLNonMaximaSuppression3x3Kernel
+ - CLOpticalFlow
+ - CLPhase
+ - CLRemap
+ - CLRemapKernel
+ - CLScharr3x3
+ - CLScharr3x3Kernel
+ - CLSobel3x3
+ - CLSobel3x3Kernel
+ - CLSobel5x5
+ - CLSobel5x5HorKernel
+ - CLSobel5x5VertKernel
+ - CLSobel7x7
+ - CLSobel7x7HorKernel
+ - CLSobel7x7VertKernel
+ - CLThreshold
+ - CLThresholdKernel
+ - CLWarpAffine
+ - CLWarpAffineKernel
+ - CLWarpPerspective
+ - CLWarpPerspectiveKernel
+ - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - NELocallyConnectedLayer
+ - NELocallyConnectedMatrixMultiplyKernel
+ - NEAbsoluteDifference
+ - NEAbsoluteDifferenceKernel
+ - NEAccumulate
+ - NEAccumulateKernel
+ - NEAccumulateSquared
+ - NEAccumulateSquaredKernel
+ - NEAccumulateWeighted
+ - NEAccumulateWeightedKernel
+ - NEAccumulateWeightedFP16Kernel
+ - NEBox3x3
+ - NEBox3x3Kernel
+ - NEBox3x3FP16Kernel
+ - NECannyEdge
+ - NEChannelCombine
+ - NEChannelCombineKernel
+ - NEChannelExtract
+ - NEChannelExtractKernel
+ - NEColorConvert
+ - NEColorConvertKernel
+ - NEConvolution3x3
+ - NEConvolutionRectangle
+ - NEConvolutionRectangleKernel
+ - NEConvolutionSquare
+ - NEConvolutionKernel
+ - NEDerivative
+ - NEDerivativeKernel
+ - NEDilate
+ - NEDilateKernel
+ - NEEqualizeHistogram
+ - NEErode
+ - NEErodeKernel
+ - NEFastCorners
+ - NEFastCornersKernel
+ - NEGaussian3x3
+ - NEGaussian3x3Kernel
+ - NEGaussian5x5
+ - NEGaussian5x5HorKernel
+ - NEGaussian5x5VertKernel
+ - NEGaussianPyramid
+ - NEGaussianPyramidHalf
+ - NEGaussianPyramidOrb
+ - NEHarrisCorners
+ - NEHarrisScoreKernel
+ - NEHarrisScoreFP16Kernel
+ - NEHistogram
+ - NEHistogramKernel
+ - NEHOGOrientationBinningKernel
+ - NEHOGBlockNormalizationKernel
+ - NEHOGDetectorKernel
+ - NEHOGNonMaximaSuppressionKernel
+ - NEHOGDescriptor
+ - NEHOGDetector
+ - NEHOGGradient
+ - NEHOGMultiDetection
+ - NEHOGOrientationBinningKernel
+ - NEHOGBlockNormalizationKernel
+ - NEHOGDetectorKernel
+ - NEIntegralImage
+ - NEIntegralImageKernel
+ - NELaplacianReconstruct
+ - NELaplacianPyramid
+ - NEMagnitude
+ - NEMagnitudePhaseKernel
+ - NEMedian3x3
+ - NEMedian3x3Kernel
+ - NEMinMaxLocation
+ - NEMinMaxLocationKernel
+ - NENonLinearFilter
+ - NENonLinearFilterKernel
+ - NENonMaximaSuppression3x3
+ - NENonMaximaSuppression3x3FP16Kernel
+ - NENonMaximaSuppression3x3Kernel
+ - NEOpticalFlow
+ - NEPhase
+ - NERemap
+ - NERemapKernel
+ - NEScharr3x3
+ - NEScharr3x3Kernel
+ - NESobel3x3
+ - NESobel3x3Kernel
+ - NESobel5x5
+ - NESobel5x5HorKernel
+ - NESobel5x5VertKernel
+ - NESobel7x7
+ - NESobel7x7HorKernel
+ - NESobel7x7VertKernel
+ - NEThreshold
+ - NEThresholdKernel
+ - NEWarpAffine
+ - NEWarpAffineKernel
+ - NEWarpPerspective
+ - NEWarpPerspectiveKernel
+ - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - GCAbsoluteDifference
+ - GCActivationLayer
+ - GCArithmeticAddition
+ - GCBatchNormalizationLayer
+ - GCConcatenateLayer
+ - GCConvolutionLayer
+ - GCDepthwiseConvolutionLayer
+ - GCDirectConvolutionLayer
+ - GCDropoutLayer
+ - GCFillBorder
+ - GCFullyConnectedLayer
+ - GCGEMM
+ - GCGEMMInterleave4x4
+ - GCGEMMTranspose1xW
+ - GCNormalizationLayer
+ - GCNormalizePlanarYUVLayer
+ - GCPixelWiseMultiplication
+ - GCPoolingLayer
+ - GCScale
+ - GCSoftmaxLayer
+ - GCTensorShift
+ - GCTranspose
+
+
+v20.08 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Added new data type QASYMM8_SIGNED support for:
+ - @ref CLArgMinMaxLayer
+ - @ref CLArgMinMaxLayerKernel
+ - Added new data type U8 support for:
+ - @ref NECropKernel
+ - CLCropKernel
+ - Added aligh_corner support for nearest neighbor interpolation in:
+ - NEScaleKernel
+ - CLScaleKernel
+ - New OpenCL kernels / functions:
+ - @ref CLMaxUnpoolingLayerKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEMaxUnpoolingLayerKernel
+ - New graph example:
+ - graph_yolov3_output_detector
+ - GEMMTuner improvements:
+ - Added fp16 support
+ - Output json files for easier integration
+ - Enabled tuning for export_to_cl_image_rhs option for RHS tensors
+ - More robust script for running benchmarks
+ - Removed padding from:
+ - NEPixelWiseMultiplicationKernel
+ - NEHeightConcatenateLayerKernel
+ - NEThresholdKernel
+ - NEBatchConcatenateLayerKernel
+ - NETransposeKernel
+ - @ref NEBatchNormalizationLayerKernel
+ - NEArithmeticSubtractionKernel
+ - @ref NEBoundingBoxTransformKernel
+ - NELogits1DMaxKernel
+ - NELogits1DSoftmaxKernel
+ - @ref NEROIPoolingLayerKernel
+ - @ref NEROIAlignLayerKernel
+ - NEYOLOLayerKernel
+ - NEUpsampleLayerKernel
+ - NEFloorKernel
+ - NEWidthConcatenateLayerKernel
+ - NEDepthConcatenateLayerKernel
+ - @ref NENormalizationLayerKernel
+ - @ref NEL2NormalizeLayerKernel
+ - NEFillArrayKernel
+ - @ref NEDepthConvertLayerKernel
+ - @ref NERangeKernel
+ - @ref NEPriorBoxLayer
+ - Removed OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ToUint8Scale
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
+ - Removed Arm® Neon™ kernels / functions:
+ - NEGEMMLowpQuantizeDownInt32ToUint8Scale
+ - NEGEMMMatrixAccumulateBiasesKernel
+ - Deprecated functions / interfaces:
+ - Non-descriptor based interfaces for NEThreshold, CLThreshold
+ - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale
+ - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer :
+ The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0.
+ Only axis 0 is supported.
+ The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0.
+ Only axis 0 is supported.
+ - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity.
+ - Removed padding requirement for the input (e.g. LHS of GEMM) and output in @ref CLGEMMMatrixMultiplyNativeKernel, @ref CLGEMMMatrixMultiplyReshapedKernel, @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and @ref CLIm2ColKernel (NHWC only)
+ - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output.
+ - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation.
+ - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since @ref CLGEMMMatrixMultiplyKernel is called and currently requires padding.
+ - Added support for exporting the OpenCL buffer object to the OpenCL image object in @ref CLGEMMMatrixMultiplyReshapedKernel and @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
+ - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
+ - The padding requirement for the OpenCL image object is considered into the @ref CLGEMMReshapeRHSMatrixKernel.
+ - The reshaped RHS matrix stores the weights when GEMM is used to accelerate @ref CLGEMMConvolutionLayer.
+
+v20.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Updated recommended NDK version to r18b.
+ - Updated recommended gcc version to Linaro 6.3.1.
+ - Added Bfloat16 type support
+ - Added Bfloat16 support in:
+ - @ref NEWeightsReshapeKernel
+ - @ref NEConvolutionLayerReshapeWeights
+ - @ref NEIm2ColKernel
+ - NEIm2Col
+ - @ref NEDepthConvertLayerKernel
+ - @ref NEDepthConvertLayer
+ - @ref NEGEMMConvolutionLayer
+ - NEGEMMAssemblyDispatch
+ - Added new data type QASYMM8_SIGNED support for:
+ - @ref CLDirectConvolutionLayer
+ - @ref CLDeconvolutionLayer
+ - @ref CLDirectDeconvolutionLayer
+ - @ref CLGEMMDeconvolutionLayer
+ - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+ - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel
+ - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+ - @ref CLReductionOperation
+ - @ref CLReduceMean
+ - @ref NEScale
+ - NEScaleKernel
+ - NEUpsampleLayer
+ - @ref NECast
+ - @ref NEReductionOperation
+ - @ref NEReduceMean
+ - @ref NEArgMinMaxLayer
+ - @ref NEDeconvolutionLayer
+ - @ref NEGEMMLowpQuantizeDownInt32ScaleKernel
+ - @ref CPPBoxWithNonMaximaSuppressionLimit
+ - @ref CPPDetectionPostProcessLayer
+ - @ref CPPPermuteKernel
+ - @ref CPPPermute
+ - @ref CPPTopKVKernel
+ - @ref CPPTopKV
+ - @ref CPPUpsample
+ - @ref CPPUpsampleKernel
+ - New OpenCL kernels / functions:
+ - @ref CLQLSTMLayer
+ - @ref CLQLSTMLayerNormalizationKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEQLSTMLayer
+ - @ref NEQLSTMLayerNormalizationKernel
+ - Added HARD_SWISH support in:
+ - CLActivationLayerKernel
+ - NEActivationLayerKernel
+ - Deprecated OpenCL kernels / functions:
+ - CLGEMMLowpQuantizeDownInt32ToUint8Scale
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
+ - Deprecated Arm® Neon™ kernels / functions:
+ - NEGEMMLowpQuantizeDownInt32ToUint8Scale
+ - Removed CPP kernels / functions:
+ - CPPFlipWeightsKernel
+ - Removed PoolingLayerInfo constructors without Data Layout.
+ - Removed CLDepthwiseConvolutionLayer3x3
+ - Removed NEDepthwiseConvolutionLayerOptimized
+ - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
+ - @ref NEWinogradConvolutionLayer
+ - @ref NEWinogradLayerTransformInputKernel
+ - @ref NEWinogradLayerTransformOutputKernel
+ - @ref NEWinogradLayerTransformWeightsKernel
+ - Added CLCompileContext
+ - Added Arm® Neon™ GEMM kernel with 2D window support
+
+v20.02.1 Maintenance release
+ - Added Android-NN build script.
+
+v20.02 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Added new data type QASYMM8_SIGNED support for:
+ - @ref CLDepthwiseConvolutionLayer
+ - CLDepthwiseConvolutionLayer3x3
+ - @ref CLGEMMConvolutionLayer
+ - @ref CLGEMMLowpMatrixMultiplyCore
+ - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - @ref CLGEMMLowpMatrixMultiplyNativeKernel
+ - @ref NEActivationLayer
+ - NEComparisonOperationKernel
+ - @ref NEConvolutionLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - NEDirectConvolutionLayerOutputStageKernel
+ - @ref NEElementwiseComparison
+ - @ref NEElementwiseMax
+ - @ref NEElementwiseMin
+ - @ref NEElementwiseSquaredDiff
+ - @ref NEFullyConnectedLayer
+ - NEGEMMMatrixVectorMultiplyKernel
+ - @ref NEPixelWiseMultiplication
+ - @ref NEPoolingLayer
+ - @ref NEPReluLayer
+ - Added support for QSYMM8_PER_CHANNEL in:
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - Added support for split sizes in:
+ - @ref CLSplit
+ - @ref NESplit
+ - New OpenCL kernels / functions:
+ - @ref CLFill
+ - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEFill
+ - @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+ - Deprecated Arm® Neon™ functions / interfaces:
+ - CLDepthwiseConvolutionLayer3x3
+ - NEDepthwiseConvolutionLayerOptimized
+ - PoolingLayerInfo constructors without Data Layout.
+ - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL.
+ - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
+ - Added the ability to build bootcode for bare metal.
+ - Added support for generating synthetic QASYMM8 graphs.
+ - Added support for F16 datatype in VGG16.
+ - Removed pre-built binaries for GLES.
+
+v19.11.1 Public maintenance release
+ - Fix offset calculation in NEReductionOperationKernel.
+ - Fix data layout in NEScaleKernel for nhwc.
+ - Retain configuration step data layout to avoid side-effects.
+ - Perform sqrt in double domain for L2 pooling.
+ - Fix output shape calculation for Reduce Mean
+ - Restrict cases where optimized NEPadLayer runs.
+
+v19.11 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Updated recommended NDK version to r17c.
+ - Deprecated OpenCL kernels / functions:
+ - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel
+ - CLDepthwiseIm2ColKernel
+ - CLDepthwiseSeparableConvolutionLayer
+ - CLDepthwiseVectorToTensorKernel
+ - CLDirectConvolutionLayerOutputStageKernel
+ - Deprecated Arm® Neon™ kernels / functions:
+ - NEDepthwiseWeightsReshapeKernel
+ - NEDepthwiseIm2ColKernel
+ - NEDepthwiseSeparableConvolutionLayer
+ - NEDepthwiseVectorToTensorKernel
+ - NEDepthwiseConvolutionLayer3x3
+ - New OpenCL kernels / functions:
+ - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer
+ - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
+ OpenCL kernels / functions)
+ - @ref CLLogSoftmaxLayer
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
+ - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
+ - @ref NEDetectionPostProcessLayer
+ - @ref NEGenerateProposalsLayer
+ - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer
+ - @ref NELogSoftmaxLayer
+ - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer
+ - Added QASYMM8 support for:
+ - @ref CLGenerateProposalsLayer
+ - @ref CLROIAlignLayer
+ - @ref CPPBoxWithNonMaximaSuppressionLimit
+ - Added QASYMM16 support for:
+ - @ref CLBoundingBoxTransform
+ - Added FP16 support for:
+ - @ref CLGEMMMatrixMultiplyReshapedKernel
+ - Added new data type QASYMM8_PER_CHANNEL support for:
+ - CLDequantizationLayer
+ - @ref NEDequantizationLayer
+ - Added new data type QSYMM8_PER_CHANNEL support for:
+ - @ref CLConvolutionLayer
+ - @ref NEConvolutionLayer
+ - @ref CLDepthwiseConvolutionLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - Added FP16 mixed-precision support for:
+ - @ref CLGEMMMatrixMultiplyReshapedKernel
+ - CLPoolingLayerKernel
+ - Added FP32 and FP16 ELU activation for:
+ - @ref CLActivationLayer
+ - @ref NEActivationLayer
+ - Added asymmetric padding support for:
+ - @ref CLDirectDeconvolutionLayer
+ - @ref CLGEMMDeconvolutionLayer
+ - @ref NEDeconvolutionLayer
+ - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer.
+ - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer.
+ - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
+ - Improved performance for CL Inception V3 - FP16.
+ - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
+ - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
+ - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance.
+ - Optimized @ref CLPadLayer.
+ - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
+ - Reduced memory consumption by implementing weights sharing.
+
+v19.08.1 Public maintenance release
+ - Fix offset calculation in NEReductionOperationKernel.
+ - Fix data layout in NEScaleKernel for nhwc.
+ - Retain configuration step data layout to avoid side-effects.
+ - Perform sqrt in double domain for L2 pooling.
+ - Fix output shape calculation for Reduce Mean
+ - Fix broadcast CLPixelwiseMultiplication with 5D tensors
+
+v19.08 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Deprecated Arm® Neon™ functions
+ - NEDepthConcatenateLayer
+ - NEWidthConcatenateLayer
+ - Deprecated OpenCL kernels / functions
+ - CLDepthConcatenateLayer
+ - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
+ - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
+ - CLWidthConcatenateLayer
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEAbsLayer
+ - @ref NECast
+ - @ref NEElementwisePower
+ - @ref NELogLayer
+ - @ref NELSTMLayerQuantized
+ - @ref NENegLayer
+ - @ref NEPReluLayer
+ - @ref NESinLayer
+ - NEBatchConcatenateLayerKernel
+ - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer
+ - NEDepthwiseConvolutionLayerNativeKernel
+ - @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer
+ - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer
+ - New OpenCL kernels / functions:
+ - @ref CLAbsLayer
+ - @ref CLElementwisePower
+ - @ref CLLogLayer
+ - @ref CLLSTMLayerQuantized
+ - @ref CLNegLayer
+ - @ref CLPReluLayer
+ - @ref CLSinLayer
+ - CLBatchConcatenateLayerKernel
+ - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer
+ - @ref CLGEMMLowpMatrixMultiplyNativeKernel
+ - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - @ref CLGEMMMatrixMultiplyNativeKernel
+ - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer
+ - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer
+ - New examples:
+ - neon_opticalflow
+ - cl_cache
+ - neon_permute
+ - Added support for FP16 in @ref NEDeconvolutionLayer
+ - Added support for FP16 in @ref CLDeconvolutionLayer
+ - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
+ - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
+ - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
+ - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases
+ - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only)
+ - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
+ - Altered @ref QuantizationInfo interface to support per-channel quantization.
+ - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations.
+ - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
+ - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
+ - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
+ - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
+
+v19.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer
+ - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication
+ - @ref NECropKernel / @ref NECropResize
+ - NEDepthwiseConvolutionAssemblyDispatch
+ - @ref NEFFTDigitReverseKernel
+ - @ref NEFFTRadixStageKernel
+ - @ref NEFFTScaleKernel
+ - @ref NEGEMMLowpOffsetContributionOutputStageKernel
+ - NEHeightConcatenateLayerKernel
+ - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer
+ - @ref NEFFT1D
+ - @ref NEFFT2D
+ - @ref NEFFTConvolutionLayer
+ - New OpenCL kernels / functions:
+ - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication
+ - CLCropKernel / @ref CLCropResize
+ - @ref CLDeconvolutionReshapeOutputKernel
+ - @ref CLFFTDigitReverseKernel
+ - @ref CLFFTRadixStageKernel
+ - @ref CLFFTScaleKernel
+ - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
+ - CLHeightConcatenateLayerKernel
+ - @ref CLDirectDeconvolutionLayer
+ - @ref CLFFT1D
+ - @ref CLFFT2D
+ - @ref CLFFTConvolutionLayer
+ - @ref CLGEMMDeconvolutionLayer
+ - New OpenGLES kernels / functions:
+ - GCConcatenateLayer
+ - Deprecated functions/interfaces
+ - GCDepthConcatenateLayer
+ - NEWidthConcatenateLayer
+ - NEDepthConcatenateLayer
+ - CLWidthConcatenateLayer
+ - CLDepthConcatenateLayer
+ - CLGEMMInterleave4x4
+ - CLGEMMTranspose1xW
+ - Support different quantization info in CLConcatLayer.
+ - Add checks on different input/output quantization info were not supported.
+ - Tensors have different quantization information.
+ - Add FP16 support checks.
+ - Fix output quantization CLDeptwiseConv3x3 when activation is fused.
+ - New graph examples:
+ - graph_convolution
+ - graph_fully_connected
+ - graph_depthwise_convolution
+ - Deepspeech v0.4.1
+ - Add support for QASYMM8 in NEArithmeticSubtractionKernel.
+ - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
+ - Add support for QASYMM8 NEDeconvolution.
+ - Add support for DequantizationLayer for Neon/CL.
+ - Add support for dilation in CLDepthwiseConvolution.
+ - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
+ - Optimize CLDeconvolution.
+ - Add StackLayer to the graph API.
+ - Add support for "reflect" padding mode in NEPad.
+ - Winograd 7x7 NHWC on OpenCL.
+ - Rework CL ML layers to run exclusively on CL.
+ - Support different quantization info in PoolingLayer.
+ - Implement and test import memory interfaces.
+ - Added new tests and removed old ones.
+ - Various clang-tidy fixes.
+
+v19.02 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - New Arm® Neon™ kernels / functions:
+ - @ref NETileKernel / @ref NETile
+ - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization
+ - NEElementwiseOperationKernel
+ - @ref NEElementwiseMax
+ - @ref NEElementwiseMin
+ - @ref NEElementwiseSquaredDiff
+ - @ref NESelectKernel / @ref NESelect
+ - @ref NESplit
+ - @ref NESlice
+ - @ref NEUnstack
+ - @ref NEStridedSliceKernel / @ref NEStridedSlice
+ - NEElementwiseUnaryKernel
+ - @ref NERsqrtLayer
+ - @ref NEExpLayer
+ - @ref NEReverseKernel / @ref NEReverse
+ - @ref NEArgMinMaxLayer
+ - @ref NEStackLayerKernel / @ref NEStackLayer
+ - @ref NERangeKernel / @ref NERange
+ - @ref NEPadLayer
+ - NEMemsetKernel
+ - @ref NEGatherKernel / @ref NEGather
+ - @ref NEElementwiseComparison
+ - @ref NEElementwiseComparisonStatic
+ - NEComparisonOperationKernel
+ - @ref NEElementwiseDivision
+ - New OpenCL kernels / functions:
+ - @ref CLSelectKernel / @ref CLSelect
+ - @ref CLTileKernel / @ref CLTile
+ - @ref CLComparisonKernel / @ref CLComparison
+ - @ref CLArgMinMaxLayer
+ - @ref CLElementwiseMax
+ - @ref CLElementwiseMin
+ - @ref CLElementwiseSquaredDiff
+ - @ref CLStackLayerKernel / @ref CLStackLayer
+ - @ref CLReverse / @ref CLReverseKernel
+ - @ref CLRsqrtLayer
+ - @ref CLExpLayer
+ - CLElementWiseUnaryLayerKernel
+ - @ref CLGEMMReshapeLHSMatrixKernel
+ - @ref CLGEMMReshapeRHSMatrixKernel
+ - @ref CLGEMMMatrixMultiplyReshapedKernel
+ - @ref CLRangeKernel / @ref CLRange
+ - @ref CLUnstack
+ - @ref CLGatherKernel / @ref CLGather
+ - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+ - New CPP kernels / functions:
+ - @ref CPPDetectionOutputLayer
+ - @ref CPPTopKV / @ref CPPTopKVKernel
+ - Added new examples:
+ - graph_ssd_mobilenet.cpp
+ - graph_mobilenet_v2.cpp
+ - graph_resnet12.cpp
+ - graph_srcnn955.cpp
+ - graph_vgg_vdsr.cpp
+ - graph_inception_resnet_v1.cpp
+ - Add 4D tensors support to
+ - @ref NESoftmaxLayer
+ - Fused activation in @ref CLWinogradConvolutionLayer
+ - Extented @ref NEPermute to support more cases
+ - Added Neon/SVE GEMM Hybrid kernels
+ - Added u8 and s8 hybrid assembly kernels
+ - Introduced GEMM strategy name in NEGEMMAssemblyWrapper
+ - Improved @ref CLTuner
+ - Fused the bias addition within @ref CLGEMM
+ - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer
+ - Added NHWC data layout support to:
+ - @ref NEScale for F16
+ - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16
+ - @ref NEL2NormalizeLayer for FP32/FP16
+ - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16
+ - @ref CLROIAlignLayer
+ - @ref CLGenerateProposalsLayer
+ - Added QASYMM8 support to the following kernels:
+ - NEArithmeticAdditionKernel
+ - @ref NEScale
+ - Added new tests and improved validation and benchmarking suites.
+ - Deprecated functions/interfaces
+ - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer
+
+v18.11 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel
+ - @ref NEReduceMean
+ - @ref NEReorgLayer / @ref NEReorgLayerKernel
+ - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel
+ - NEUpsampleLayer / NEUpsampleLayerKernel
+ - NEYOLOLayer / NEYOLOLayerKernel
+ - New OpenCL kernels / functions:
+ - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel
+ - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel
+ - @ref CLComputeAllAnchorsKernel
+ - @ref CLGenerateProposalsLayer
+ - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel
+ - @ref CLReorgLayer / @ref CLReorgLayerKernel
+ - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel
+ - @ref CLPadLayer
+ - @ref CLReduceMean
+ - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel
+ - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel
+ - @ref CLSlice
+ - @ref CLSplit
+ - @ref CLStridedSlice / @ref CLStridedSliceKernel
+ - CLUpsampleLayer / CLUpsampleLayerKernel
+ - CLYOLOLayer / CLYOLOLayerKernel
+ - New CPP kernels / functions:
+ - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel
+ - Added the validate method in:
+ - @ref NEDepthConvertLayer
+ - @ref NEFloor / @ref CLFloor
+ - @ref NEGEMMMatrixAdditionKernel
+ - @ref NEReshapeLayer / @ref CLReshapeLayer
+ - @ref CLScale
+ - Added new examples:
+ - graph_shufflenet.cpp
+ - graph_yolov3.cpp
+ - Added documentation for add a new function or kernel.
+ - Improved doxygen documentation adding a list of the existing functions.
+ - Add 4D tensors support to
+ - CLWidthConcatenateLayer
+ - CLFlattenLayer
+ - @ref CLSoftmaxLayer
+ - Add dot product support for @ref CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
+ - Add SVE support
+ - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization
+ - Fuses activation in @ref CLDepthwiseConvolutionLayer3x3NCHWKernel, @ref CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer
+ - Added NHWC data layout support to:
+ - @ref CLChannelShuffleLayer
+ - @ref CLDeconvolutionLayer
+ - @ref CLL2NormalizeLayer
+ - Added QASYMM8 support to the following kernels:
+ - CLScaleKernel
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - CLPixelWiseMultiplicationKernel
+ - Added FP16 support to the following kernels:
+ - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - @ref CLNormalizePlanarYUVLayerKernel
+ - @ref CLWinogradConvolutionLayer (5x5 kernel)
+ - More tests added to both validation and benchmarking suites.
+
+v18.08 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Updated recommended NDK version to r17b.
+ - Removed support for QS8/QS16 data types.
+ - Added support for grouped convolution in @ref CLConvolutionLayer.
+ - Added NHWC data layout support to:
+ - NEDepthConcatenateLayer / CLDepthConcatenateLayer
+ - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer
+ - @ref CLDepthwiseConvolutionLayer
+ - @ref CLDirectConvolutionLayer
+ - @ref CLConvolutionLayer
+ - @ref CLScale
+ - @ref CLIm2ColKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NERNNLayer
+ - New OpenCL kernels / functions:
+ - @ref CLArithmeticDivision
+ - Introduced prepare() stage support in the graph API for GLES.
+ - Added support for memory reusage when trying to allocate smaller CLTensors.
+ - Enabled NHWC execution on graph examples.
+ - Added JPEG accessor for validation purposes.
+ - Added validate methods to some kernels / functions.
+
+v18.05 Public major release
+ - Various bug fixes.
+ - Various optimisations.
+ - Major redesign in the interface for the neon kernels implemented in assembly.
+ - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
+ - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions.
+ - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
+ - Moved neon assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
+ - Improved doxygen documentation.
+ - Improved memory management for layer's transitions.
+ - Added support for NHWC data layout in tensors.
+ - Added NHWC data layout support to:
+ - @ref NEGEMMConvolutionLayer
+ - @ref NEDirectConvolutionLayer
+ - @ref NEPoolingLayer / @ref CLPoolingLayer
+ - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - @ref NEScale
+ - NEIm2Col
+ - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer.
+ - New OpenCL kernels / functions:
+ - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel
+ - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights
+ - @ref CLCopy / CLCopyKernel
+ - @ref CLLSTMLayer
+ - @ref CLRNNLayer
+ - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel
+ - @ref CLWinogradFilterTransformKernel / @ref CLWinogradInputTransformKernel / @ref CLWinogradConvolutionLayer
+ - @ref CLWinogradInputTransformKernel / @ref CLWinogradInputTransform
+ - New Arm® Neon™ kernels / functions:
+ - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights.
+ - Created the validate method in @ref CLDepthwiseConvolutionLayer.
+ - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer.
+ - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer.
+ - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel.
+ - Port mobilenet example to NHWC data layout.
+ - Enabled Winograd method in @ref CLConvolutionLayer.
+ - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer.
+ - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
+ - Added memory manager support in GLES functions.
+ - Major refactoring of the graph API.
+ - Added GLES backend in the graph API.
+ - Added support for the memory manager in the graph API.
+ - Enabled Winograd Convolution method in the graph API.
+ - Added support for grouped convolutions in the graph API.
+ - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer.
+ - Added fast maths flag in @ref CLConvolutionLayer.
+ - Added new tests and benchmarks in validation and benchmark frameworks
+ - Merge Activation layer with Convolution Layer (Neon. CL, GLES)
+ - Added support to OpenCL 2.0 SVM
+ - Added support to import memory in OpenCL tensors.
+ - Added the prepare() method to perform any one off pre-processing before running the function.
+ - Added new examples:
+ - graph_inception_v4.cpp
+ - graph_resnext50.cpp
+ - Added memory measurement instrument for CL.
+
+v18.03 Public maintenance release
+ - Various bug fixes.
+ - Fixed bug in @ref NEActivationLayer
+ - Fix in @ref CLTuner when using batches.
+ - Updated recommended NDK version to r16b (And fixed warnings).
+ - Fixed bug in validation code.
+ - Added Inception v4 graph example.
+ - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
+
+v18.02 Public major release
+ - Various Arm® Neon™ / OpenCL / GLES optimisations.
+ - Various bug fixes.
+ - Changed default number of threads on big LITTLE systems.
+ - Refactored examples and added:
+ - graph_mobilenet_qassym8
+ - graph_resnet
+ - graph_squeezenet_v1_1
+ - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method.
+ - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method.
+ - Added in place support to:
+ - @ref CLActivationLayer
+ - @ref CLBatchNormalizationLayer
+ - Added QASYMM8 support to:
+ - @ref CLActivationLayer
+ - @ref CLDepthwiseConvolutionLayer
+ - @ref NEDepthwiseConvolutionLayer
+ - @ref NESoftmaxLayer
+ - Added FP16 support to:
+ - CLDepthwiseConvolutionLayer3x3
+ - @ref CLDepthwiseConvolutionLayer
+ - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication
+ - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer
+ - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
+ - New OpenCL kernels / functions:
+ - CLDirectConvolutionLayerOutputStageKernel
+ - New Arm® Neon™ kernels / functions
+ - Added name() method to all kernels.
+ - Added support for Winograd 5x5.
+ - NEPermuteKernel / @ref NEPermute
+ - @ref NEWinogradLayerTransformInputKernel / NEWinogradLayer
+ - @ref NEWinogradLayerTransformOutputKernel / NEWinogradLayer
+ - @ref NEWinogradLayerTransformWeightsKernel / NEWinogradLayer
+ - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel
+ - New GLES kernels / functions:
+ - GCTensorShiftKernel / GCTensorShift
+
+v18.01 Public maintenance release
+ - Various bug fixes
+ - Added some of the missing validate() methods
+ - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample
+ - Added CLPermuteKernel / @ref CLPermute
+ - Added method to clean the programs cache in the CL Kernel library.
+ - Added GCArithmeticAdditionKernel / GCArithmeticAddition
+ - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
+ - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
+ - Added GCScaleKernel / GCScale
+ - Added GCWeightsReshapeKernel / GCConvolutionLayer
+ - Added FP16 support to the following GLES compute kernels:
+ - GCCol2ImKernel
+ - GCGEMMInterleave4x4Kernel
+ - GCGEMMTranspose1xWKernel
+ - GCIm2ColKernel
+ - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
+ - Added NEDirectConvolutionLayerOutputStageKernel
+ - Added QASYMM8 support to the following Arm® Neon™ kernels:
+ - NEDepthwiseConvolutionLayer3x3Kernel
+ - @ref NEFillBorderKernel
+ - NEPoolingLayerKernel
+ - Added new examples:
+ - graph_cl_mobilenet_qasymm8.cpp
+ - graph_inception_v3.cpp
+ - gc_dc.cpp
+ - More tests added to both validation and benchmarking suites.
+
+v17.12 Public major release
+ - Most machine learning functions on OpenCL support the new data type QASYMM8
+ - Introduced logging interface
+ - Introduced opencl timer
+ - Reworked GEMMLowp interface
+ - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
+ - Added validation method for most Machine Learning kernels / functions
+ - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
+ - Added sgemm example for OpenCL
+ - Added absolute difference example for GLES compute
+ - Added new tests and benchmarks in validation and benchmark frameworks
+ - Added new kernels / functions for GLES compute
+
+ - New OpenGL ES kernels / functions
+ - GCAbsoluteDifferenceKernel / GCAbsoluteDifference
+ - GCActivationLayerKernel / GCActivationLayer
+ - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
+ - GCCol2ImKernel
+ - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
+ - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
+ - GCDropoutLayerKernel / GCDropoutLayer
+ - GCFillBorderKernel / GCFillBorder
+ - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
+ - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
+ - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
+ - GCIm2ColKernel
+ - GCNormalizationLayerKernel / GCNormalizationLayer
+ - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
+ - GCPoolingLayerKernel / GCPoolingLayer
+ - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
+ - GCTransposeKernel / GCTranspose
+
+ - New Arm® Neon™ kernels / functions
+ - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
+ - arm_compute::NEHGEMMAArch64FP16Kernel
+ - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
+ - @ref NEGEMMLowpOffsetContributionKernel / @ref NEGEMMLowpMatrixAReductionKernel / @ref NEGEMMLowpMatrixBReductionKernel / @ref NEGEMMLowpMatrixMultiplyCore
+ - @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+ - NEWinogradLayer / NEWinogradLayerKernel
+
+ - New OpenCL kernels / functions
+ - @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+
+ - New graph nodes for Arm® Neon™ and OpenCL
+ - graph::BranchLayer
+ - graph::DepthConvertLayer
+ - graph::DepthwiseConvolutionLayer
+ - graph::DequantizationLayer
+ - graph::FlattenLayer
+ - graph::QuantizationLayer
+ - graph::ReshapeLayer
+
+v17.10 Public maintenance release
+ - Bug fixes:
+ - Check the maximum local workgroup size supported by OpenCL devices
+ - Minor documentation updates (Fixed instructions to build the examples)
+ - Introduced a graph::GraphContext
+ - Added a few new Graph nodes, support for branches and grouping.
+ - Automatically enable cl_printf in debug builds
+ - Fixed bare metal builds for armv7a
+ - Added AlexNet and cartoon effect examples
+ - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)
+
+v17.09 Public major release
+ - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
+ - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
+ - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
+ - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL.
+ - New Arm® Neon™ kernels / functions:
+ - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
+ - NEDequantizationLayerKernel / @ref NEDequantizationLayer
+ - NEFloorKernel / @ref NEFloor
+ - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer
+ - NEQuantizationLayerKernel @ref NEMinMaxLayerKernel / @ref NEQuantizationLayer
+ - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer
+ - @ref NEReductionOperationKernel / @ref NEReductionOperation
+ - NEReshapeLayerKernel / @ref NEReshapeLayer
+
+ - New OpenCL kernels / functions:
+ - @ref CLDepthwiseConvolutionLayer3x3NCHWKernel @ref CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer
+ - CLDequantizationLayerKernel / CLDequantizationLayer
+ - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer
+ - CLFlattenLayer
+ - CLFloorKernel / @ref CLFloor
+ - CLGEMMTranspose1xW
+ - CLGEMMMatrixVectorMultiplyKernel
+ - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer
+ - CLQuantizationLayerKernel @ref CLMinMaxLayerKernel / @ref CLQuantizationLayer
+ - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer
+ - @ref CLReductionOperationKernel / @ref CLReductionOperation
+ - CLReshapeLayerKernel / @ref CLReshapeLayer
+
+v17.06 Public major release
+ - Various bug fixes
+ - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
+ - Added unit tests and benchmarks (AlexNet, LeNet)
+ - Added support for sub tensors.
+ - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
+ - Added @ref OMPScheduler (OpenMP) scheduler for Neon
+ - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
+ - User can specify his own scheduler by implementing the @ref IScheduler interface.
+ - New OpenCL kernels / functions:
+ - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
+ - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer
+ - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection
+ - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer
+ - @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights
+ - New C++ kernels:
+ - CPPDetectionWindowNonMaximaSuppressionKernel
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
+ - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
+ - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
+ - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer
+ - @ref NEWeightsReshapeKernel / @ref NEConvolutionLayerReshapeWeights
+
+v17.05 Public bug fixes release
+ - Various bug fixes
+ - Remaining of the functions ported to use accurate padding.
+ - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
+ - Added "free" method to allocator.
+ - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9
+
+v17.04 Public bug fixes release
+
+ The following functions have been ported to use the new accurate padding:
+ - CLColorConvertKernel
+ - CLEdgeNonMaxSuppressionKernel
+ - CLEdgeTraceKernel
+ - CLGaussianPyramidHorKernel
+ - CLGaussianPyramidVertKernel
+ - CLGradientKernel
+ - NEChannelCombineKernel
+ - NEFillArrayKernel
+ - NEGaussianPyramidHorKernel
+ - NEGaussianPyramidVertKernel
+ - NEHarrisScoreFP16Kernel
+ - NEHarrisScoreKernel
+ - NEHOGDetectorKernel
+ - NELogits1DMaxKernel
+ - NELogits1DShiftExpSumKernel
+ - NELogits1DNormKernel
+ - NENonMaximaSuppression3x3FP16Kernel
+ - NENonMaximaSuppression3x3Kernel
+
+v17.03.1 First Major public release of the sources
+ - Renamed the library to arm_compute
+ - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
+ - New padding calculation interface introduced and ported most kernels / functions to use it.
+ - New OpenCL kernels / functions:
+ - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
+ - New Arm® Neon™ kernels / functions:
+ - @ref NENormalizationLayerKernel / @ref NENormalizationLayer
+ - NETransposeKernel / @ref NETranspose
+ - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
+ - @ref NEIm2ColKernel, @ref NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer
+ - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer
+ - @ref NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp
+
+v17.03 Sources preview
+ - New OpenCL kernels / functions:
+ - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge
+ - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, @ref CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM
+ - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer
+ - CLTransposeKernel / @ref CLTranspose
+ - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow
+ - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
+ - CLLaplacianPyramid, CLLaplacianReconstruct
+ - New Arm® Neon™ kernels / functions:
+ - NEActivationLayerKernel / @ref NEActivationLayer
+ - GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM
+ - NEPoolingLayerKernel / @ref NEPoolingLayer
+
+v17.02.1 Sources preview
+ - New OpenCL kernels / functions:
+ - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer
+ - CLPoolingLayerKernel / @ref CLPoolingLayer
+ - @ref CLIm2ColKernel, @ref CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
+ - @ref CLRemapKernel / @ref CLRemap
+ - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
+ - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
+ - CLNonLinearFilterKernel / CLNonLinearFilter
+ - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
+ - NEAccumulateWeightedFP16Kernel
+ - NEBox3x3FP16Kernel
+ - NENonMaximaSuppression3x3FP16Kernel
+
+v17.02 Sources preview
+ - New OpenCL kernels / functions:
+ - CLActivationLayerKernel / @ref CLActivationLayer
+ - CLChannelCombineKernel / CLChannelCombine
+ - CLDerivativeKernel / CLChannelExtract
+ - CLFastCornersKernel / CLFastCorners
+ - CLMeanStdDevKernel / CLMeanStdDev
+ - New Arm® Neon™ kernels / functions:
+ - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
+ - NENonLinearFilterKernel / NENonLinearFilter
+ - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
+ - Switched all the kernels / functions to use tensors instead of images.
+ - Updated documentation to include instructions to build the library from sources.
+
+v16.12 Binary preview release
+ - Original release
+
+ */
+} // namespace arm_compute \ No newline at end of file