diff options
Diffstat (limited to 'docs/user_guide/release_version_and_change_log.dox')
-rw-r--r-- | docs/user_guide/release_version_and_change_log.dox | 1389 |
1 files changed, 1389 insertions, 0 deletions
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox new file mode 100644 index 0000000000..b9e3b37263 --- /dev/null +++ b/docs/user_guide/release_version_and_change_log.dox @@ -0,0 +1,1389 @@ +/// +/// Copyright (c) 2017-2021 Arm Limited. +/// +/// SPDX-License-Identifier: MIT +/// +/// Permission is hereby granted, free of charge, to any person obtaining a copy +/// of this software and associated documentation files (the "Software"), to +/// deal in the Software without restriction, including without limitation the +/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or +/// sell copies of the Software, and to permit persons to whom the Software is +/// furnished to do so, subject to the following conditions: +/// +/// The above copyright notice and this permission notice shall be included in all +/// copies or substantial portions of the Software. +/// +/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +/// SOFTWARE. +/// +namespace arm_compute +{ +/** @page versions_changelogs Release Versions and Changelog + +@tableofcontents + +@section S2_1_versions Release versions + +All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. +If there is more than one release in a month then an extra sequential number is appended at the end: + + v17.03 (First release of March 2017) + v17.03.1 (Second release of March 2017) + v17.04 (First release of April 2017) + +@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes. + +@section S2_2_changelog Changelog + +v21.05 Public major release + - Removed computer vision support from Arm® Neon™ backend + - Removed the following functions: + - NEAbsoluteDifference + - NEAccumulate + - NEBox3x3 + - NECannyEdge + - NEChannelCombine + - NEChannelExtract + - NEColorConvert + - NEConvolution + - NEDerivative + - NEDilate + - NEEqualizeHistogram + - NEErode + - NEFastCorners + - NEGaussian3x3 + - NEGaussian5x5 + - NEGaussianPyramid + - NEHOGDescriptor + - NEHOGDetector + - NEHOGGradient + - NEHOGMultiDetection + - NEHarrisCorners + - NEHistogram + - NEIntegralImage + - NELaplacianPyramid + - NELaplacianReconstruct + - NEMagnitude + - NEMeanStdDev + - NEMedian3x3 + - NEMinMaxLocation + - NENonLinearFilter + - NEOpticalFlow + - NEPhase + - NEScharr3x3 + - NESobel3x3 + - NESobel5x5 + - NESobel7x7 + - NETableLookup + - NEThreshold + - NEWarpAffine + - NEWarpPerspectiveKernel + + - Remove all GLES kernels / functions / tests / examples + - Removed computer vision support from CL backend + - Removed the following functions: + - CLAbsoluteDifference + - CLAccumulate + - CLBox3x3 + - CLCannyEdge + - CLChannelCombine + - CLChannelExtract + - CLColorConvert + - CLConvolution + - CLDerivative + - CLDilate + - CLEqualizeHistogram + - CLErode + - CLFastCorners + - CLGaussian3x3 + - CLGaussian5x5 + - CLGaussianPyramid + - CLHOGDescriptor + - CLHOGDetector + - CLHOGGradient + - CLHOGMultiDetection + - CLHarrisCorners + - CLHistogram + - CLIntegralImage + - CLLaplacianPyramid + - CLLaplacianReconstruct + - CLMagnitude + - CLMeanStdDev + - CLMedian3x3 + - CLMinMaxLocation + - CLNonLinearFilter + - CLOpticalFlow + - CLPhase + - CLScharr3x3 + - CLSobel3x3 + - CLSobel5x5 + - CLSobel7x7 + - CLTableLookup + - CLThreshold + - CLWarpAffine + - CLWarpPerspective + +v21.02 Public major release + - Various bug fixes. + - Various optimisations. + - Upgrade C++ standard to C++14 + - Add macOS support + - Add Armv8-R AArch64 architecture support + - Add SVE/SVE2 support for: + - NEScaleKernel + - @ref NEActivationLayer + - @ref NEArithmeticAddition + - @ref NEBatchNormalizationLayerKernel + - @ref cpu::kernels::CpuLogits1DSoftmaxKernel + - @ref cpu::kernels::CpuLogits1DMaxKernel + - @ref cpu::kernels::CpuElementwiseUnaryKernel + - Remove padding from OpenCL kernels: + - CLDirectConvolutionLayerKernel + - @ref CLArgMinMaxLayerKernel + - @ref CLPadLayerKernel + - @ref CLROIAlignLayerKernel + - @ref CLRangeKernel + - CLScaleKernel + - @ref CLSelectKernel + - @ref CLBitwiseKernel + - @ref opencl::kernels::ClFloorKernel + - CLTransposeKernel + - Deprecate functions in CLTuner: + - add_lws_to_table + - import_lws_table + - lws_table + - Remove functions: + - NELocallyConnectedLayer / CLLocallyConnectedLayer + - NEIm2Col + - NECol2Im + - NEGEMMInterleave4x4 + - NEGEMMTranspose1xW + - NEComputeAllAnchors / CLComputeAllAnchors + - NEGEMMAssemblyDispatch + - NEUpsampleLayer / CLUpsampleLayer + - Remove kernels: + - NEGEMMMatrixVectorMultiplyKernel + - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel + - NEUpsampleLayerKernel / CLUpsampleLayerKernel + - Extend OpenCL tuner with workgroup batch size support + - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units + - Add functionality to load the OpenCL GEMM heuristics at runtime + - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL + - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation + - Note: data-type decoupling is in progress and expiremental. Warning of unused symbols might be raised + +v20.11 Public major release + - Various bug fixes. + - Various optimisations. + - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type. + This is planned to be resolved in 21.02 release. + - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer. + - Added new data type S32 support for: + - NEArithmeticSubtraction + - NEArithmeticSubtractionKernel + - @ref NEPixelWiseMultiplication + - NEPixelWiseMultiplicationKernel + - NEElementwiseDivision + - NEDivisionOperationKernel + - Interface change + - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension + on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. + The supported value range of axis is [-rank, rank). + This change applies to the following functions: + - @ref NESoftmaxLayer + - @ref NELogSoftmaxLayer + - @ref CLSoftmaxLayer + - @ref CLLogSoftmaxLayer + - GCSoftmaxLayer + - New OpenCL kernels / functions: + - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel + - @ref CLLogicalNot + - @ref CLLogicalAnd + - @ref CLLogicalOr + - New Arm® Neon™ kernels / functions: + - @ref NELogicalNot + - @ref NELogicalAnd + - @ref NELogicalOr + - Removed padding from Arm® Neon™ kernels: + - NEComplexPixelWiseMultiplicationKernel + - NENonMaximaSuppression3x3Kernel + - @ref NERemapKernel + - @ref NEGEMMInterleave4x4Kernel + - NEDirectConvolutionLayerKernel + - NEScaleKernel + - NELocallyConnectedMatrixMultiplyKernel + - @ref NEGEMMLowpOffsetContributionKernel + - @ref NEGEMMTranspose1xWKernel + - NEPoolingLayerKernel + - NEConvolutionKernel + - NEDepthwiseConvolutionLayerNativeKernel + - @ref NEGEMMLowpMatrixMultiplyKernel + - @ref NEGEMMMatrixMultiplyKernel + - NEDirectConvolutionLayerOutputStageKernel + - @ref NEReductionOperationKernel + - @ref NEGEMMLowpMatrixAReductionKernel + - @ref NEGEMMLowpMatrixBReductionKernel + - Removed padding from OpenCL kernels: + - CLBatchConcatenateLayerKernel + - CLElementwiseOperationKernel + - @ref CLBatchNormalizationLayerKernel + - CLPoolingLayerKernel + - @ref CLWinogradInputTransformKernel + - @ref CLGEMMLowpMatrixMultiplyNativeKernel + - @ref CLGEMMLowpMatrixAReductionKernel + - @ref CLGEMMLowpMatrixBReductionKernel + - @ref CLGEMMLowpOffsetContributionOutputStageKernel + - @ref CLGEMMLowpOffsetContributionKernel + - @ref CLWinogradOutputTransformKernel + - @ref CLGEMMLowpMatrixMultiplyReshapedKernel + - @ref CLFuseBatchNormalizationKernel + - @ref CLDepthwiseConvolutionLayerNativeKernel + - @ref CLDepthConvertLayerKernel + - CLCopyKernel + - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel + - CLActivationLayerKernel + - @ref CLWinogradFilterTransformKernel + - CLWidthConcatenateLayerKernel + - CLWidthConcatenate4TensorsKernel + - CLWidthConcatenate2TensorsKernel + - CLLogits1DMaxShiftExpSumKernel + - CLLogits1DNormKernel + - CLHeightConcatenateLayerKernel + - @ref CLGEMMMatrixMultiplyKernel + - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel + - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel + - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel + - CLDepthConcatenateLayerKernel + - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel + - Removed OpenCL kernels / functions: + - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel + - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel + - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel + - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): + - CLLocallyConnectedLayer + - CLLocallyConnectedMatrixMultiplyKernel + - CLAbsoluteDifference + - CLAbsoluteDifferenceKernel + - CLAccumulate + - CLAccumulateKernel + - CLAccumulateSquared + - CLAccumulateSquaredKernel + - CLAccumulateWeighted + - CLAccumulateWeightedKernel + - CLAccumulateWeightedFP16Kernel + - CLBox3x3 + - CLBox3x3Kernel + - CLBox3x3FP16Kernel + - CLCannyEdge + - CLChannelCombine + - CLChannelCombineKernel + - CLChannelExtract + - CLChannelExtractKernel + - CLColorConvert + - CLColorConvertKernel + - CLConvolution3x3 + - CLConvolutionRectangle + - CLConvolutionRectangleKernel + - CLConvolutionSquare + - CLConvolutionKernel + - CLDerivative + - CLDerivativeKernel + - CLDilate + - CLDilateKernel + - CLEqualizeHistogram + - CLErode + - CLErodeKernel + - CLFastCorners + - CLFastCornersKernel + - CLGaussian3x3 + - CLGaussian3x3Kernel + - CLGaussian5x5 + - CLGaussian5x5HorKernel + - CLGaussian5x5VertKernel + - CLGaussianPyramid + - CLGaussianPyramidHalf + - CLGaussianPyramidOrb + - CLHarrisCorners + - CLHarrisScoreKernel + - CLHarrisScoreFP16Kernel + - CLHistogram + - CLHistogramKernel + - CLHOGOrientationBinningKernel + - CLHOGBlockNormalizationKernel + - CLHOGDetectorKernel + - CLHOGNonMaximaSuppressionKernel + - CLHOGDescriptor + - CLHOGDetector + - CLHOGGradient + - CLHOGMultiDetection + - CLHOGOrientationBinningKernel + - CLHOGBlockNormalizationKernel + - CLHOGDetectorKernel + - CLIntegralImage + - CLIntegralImageKernel + - CLLaplacianReconstruct + - CLLaplacianPyramid + - CLMagnitude + - CLMagnitudePhaseKernel + - CLMedian3x3 + - CLMedian3x3Kernel + - CLMinMaxLocation + - CLMinMaxLocationKernel + - CLNonLinearFilter + - CLNonLinearFilterKernel + - CLNonMaximaSuppression3x3 + - CLNonMaximaSuppression3x3FP16Kernel + - CLNonMaximaSuppression3x3Kernel + - CLOpticalFlow + - CLPhase + - CLRemap + - CLRemapKernel + - CLScharr3x3 + - CLScharr3x3Kernel + - CLSobel3x3 + - CLSobel3x3Kernel + - CLSobel5x5 + - CLSobel5x5HorKernel + - CLSobel5x5VertKernel + - CLSobel7x7 + - CLSobel7x7HorKernel + - CLSobel7x7VertKernel + - CLThreshold + - CLThresholdKernel + - CLWarpAffine + - CLWarpAffineKernel + - CLWarpPerspective + - CLWarpPerspectiveKernel + - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): + - NELocallyConnectedLayer + - NELocallyConnectedMatrixMultiplyKernel + - NEAbsoluteDifference + - NEAbsoluteDifferenceKernel + - NEAccumulate + - NEAccumulateKernel + - NEAccumulateSquared + - NEAccumulateSquaredKernel + - NEAccumulateWeighted + - NEAccumulateWeightedKernel + - NEAccumulateWeightedFP16Kernel + - NEBox3x3 + - NEBox3x3Kernel + - NEBox3x3FP16Kernel + - NECannyEdge + - NEChannelCombine + - NEChannelCombineKernel + - NEChannelExtract + - NEChannelExtractKernel + - NEColorConvert + - NEColorConvertKernel + - NEConvolution3x3 + - NEConvolutionRectangle + - NEConvolutionRectangleKernel + - NEConvolutionSquare + - NEConvolutionKernel + - NEDerivative + - NEDerivativeKernel + - NEDilate + - NEDilateKernel + - NEEqualizeHistogram + - NEErode + - NEErodeKernel + - NEFastCorners + - NEFastCornersKernel + - NEGaussian3x3 + - NEGaussian3x3Kernel + - NEGaussian5x5 + - NEGaussian5x5HorKernel + - NEGaussian5x5VertKernel + - NEGaussianPyramid + - NEGaussianPyramidHalf + - NEGaussianPyramidOrb + - NEHarrisCorners + - NEHarrisScoreKernel + - NEHarrisScoreFP16Kernel + - NEHistogram + - NEHistogramKernel + - NEHOGOrientationBinningKernel + - NEHOGBlockNormalizationKernel + - NEHOGDetectorKernel + - NEHOGNonMaximaSuppressionKernel + - NEHOGDescriptor + - NEHOGDetector + - NEHOGGradient + - NEHOGMultiDetection + - NEHOGOrientationBinningKernel + - NEHOGBlockNormalizationKernel + - NEHOGDetectorKernel + - NEIntegralImage + - NEIntegralImageKernel + - NELaplacianReconstruct + - NELaplacianPyramid + - NEMagnitude + - NEMagnitudePhaseKernel + - NEMedian3x3 + - NEMedian3x3Kernel + - NEMinMaxLocation + - NEMinMaxLocationKernel + - NENonLinearFilter + - NENonLinearFilterKernel + - NENonMaximaSuppression3x3 + - NENonMaximaSuppression3x3FP16Kernel + - NENonMaximaSuppression3x3Kernel + - NEOpticalFlow + - NEPhase + - NERemap + - NERemapKernel + - NEScharr3x3 + - NEScharr3x3Kernel + - NESobel3x3 + - NESobel3x3Kernel + - NESobel5x5 + - NESobel5x5HorKernel + - NESobel5x5VertKernel + - NESobel7x7 + - NESobel7x7HorKernel + - NESobel7x7VertKernel + - NEThreshold + - NEThresholdKernel + - NEWarpAffine + - NEWarpAffineKernel + - NEWarpPerspective + - NEWarpPerspectiveKernel + - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): + - GCAbsoluteDifference + - GCActivationLayer + - GCArithmeticAddition + - GCBatchNormalizationLayer + - GCConcatenateLayer + - GCConvolutionLayer + - GCDepthwiseConvolutionLayer + - GCDirectConvolutionLayer + - GCDropoutLayer + - GCFillBorder + - GCFullyConnectedLayer + - GCGEMM + - GCGEMMInterleave4x4 + - GCGEMMTranspose1xW + - GCNormalizationLayer + - GCNormalizePlanarYUVLayer + - GCPixelWiseMultiplication + - GCPoolingLayer + - GCScale + - GCSoftmaxLayer + - GCTensorShift + - GCTranspose + + +v20.08 Public major release + - Various bug fixes. + - Various optimisations. + - Added new data type QASYMM8_SIGNED support for: + - @ref CLArgMinMaxLayer + - @ref CLArgMinMaxLayerKernel + - Added new data type U8 support for: + - @ref NECropKernel + - CLCropKernel + - Added aligh_corner support for nearest neighbor interpolation in: + - NEScaleKernel + - CLScaleKernel + - New OpenCL kernels / functions: + - @ref CLMaxUnpoolingLayerKernel + - New Arm® Neon™ kernels / functions: + - @ref NEMaxUnpoolingLayerKernel + - New graph example: + - graph_yolov3_output_detector + - GEMMTuner improvements: + - Added fp16 support + - Output json files for easier integration + - Enabled tuning for export_to_cl_image_rhs option for RHS tensors + - More robust script for running benchmarks + - Removed padding from: + - NEPixelWiseMultiplicationKernel + - NEHeightConcatenateLayerKernel + - NEThresholdKernel + - NEBatchConcatenateLayerKernel + - NETransposeKernel + - @ref NEBatchNormalizationLayerKernel + - NEArithmeticSubtractionKernel + - @ref NEBoundingBoxTransformKernel + - NELogits1DMaxKernel + - NELogits1DSoftmaxKernel + - @ref NEROIPoolingLayerKernel + - @ref NEROIAlignLayerKernel + - NEYOLOLayerKernel + - NEUpsampleLayerKernel + - NEFloorKernel + - NEWidthConcatenateLayerKernel + - NEDepthConcatenateLayerKernel + - @ref NENormalizationLayerKernel + - @ref NEL2NormalizeLayerKernel + - NEFillArrayKernel + - @ref NEDepthConvertLayerKernel + - @ref NERangeKernel + - @ref NEPriorBoxLayer + - Removed OpenCL kernels / functions: + - CLGEMMLowpQuantizeDownInt32ToUint8Scale + - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat + - Removed Arm® Neon™ kernels / functions: + - NEGEMMLowpQuantizeDownInt32ToUint8Scale + - NEGEMMMatrixAccumulateBiasesKernel + - Deprecated functions / interfaces: + - Non-descriptor based interfaces for NEThreshold, CLThreshold + - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale + - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer : + The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0. + Only axis 0 is supported. + The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0. + Only axis 0 is supported. + - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity. + - Removed padding requirement for the input (e.g. LHS of GEMM) and output in @ref CLGEMMMatrixMultiplyNativeKernel, @ref CLGEMMMatrixMultiplyReshapedKernel, @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and @ref CLIm2ColKernel (NHWC only) + - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output. + - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation. + - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since @ref CLGEMMMatrixMultiplyKernel is called and currently requires padding. + - Added support for exporting the OpenCL buffer object to the OpenCL image object in @ref CLGEMMMatrixMultiplyReshapedKernel and @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel. + - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object. + - The padding requirement for the OpenCL image object is considered into the @ref CLGEMMReshapeRHSMatrixKernel. + - The reshaped RHS matrix stores the weights when GEMM is used to accelerate @ref CLGEMMConvolutionLayer. + +v20.05 Public major release + - Various bug fixes. + - Various optimisations. + - Updated recommended NDK version to r18b. + - Updated recommended gcc version to Linaro 6.3.1. + - Added Bfloat16 type support + - Added Bfloat16 support in: + - @ref NEWeightsReshapeKernel + - @ref NEConvolutionLayerReshapeWeights + - @ref NEIm2ColKernel + - NEIm2Col + - @ref NEDepthConvertLayerKernel + - @ref NEDepthConvertLayer + - @ref NEGEMMConvolutionLayer + - NEGEMMAssemblyDispatch + - Added new data type QASYMM8_SIGNED support for: + - @ref CLDirectConvolutionLayer + - @ref CLDeconvolutionLayer + - @ref CLDirectDeconvolutionLayer + - @ref CLGEMMDeconvolutionLayer + - @ref CLGEMMLowpMatrixMultiplyReshapedKernel + - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel + - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel + - @ref CLReductionOperation + - @ref CLReduceMean + - @ref NEScale + - NEScaleKernel + - NEUpsampleLayer + - @ref NECast + - @ref NEReductionOperation + - @ref NEReduceMean + - @ref NEArgMinMaxLayer + - @ref NEDeconvolutionLayer + - @ref NEGEMMLowpQuantizeDownInt32ScaleKernel + - @ref CPPBoxWithNonMaximaSuppressionLimit + - @ref CPPDetectionPostProcessLayer + - @ref CPPPermuteKernel + - @ref CPPPermute + - @ref CPPTopKVKernel + - @ref CPPTopKV + - @ref CPPUpsample + - @ref CPPUpsampleKernel + - New OpenCL kernels / functions: + - @ref CLQLSTMLayer + - @ref CLQLSTMLayerNormalizationKernel + - New Arm® Neon™ kernels / functions: + - @ref NEQLSTMLayer + - @ref NEQLSTMLayerNormalizationKernel + - Added HARD_SWISH support in: + - CLActivationLayerKernel + - NEActivationLayerKernel + - Deprecated OpenCL kernels / functions: + - CLGEMMLowpQuantizeDownInt32ToUint8Scale + - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat + - Deprecated Arm® Neon™ kernels / functions: + - NEGEMMLowpQuantizeDownInt32ToUint8Scale + - Removed CPP kernels / functions: + - CPPFlipWeightsKernel + - Removed PoolingLayerInfo constructors without Data Layout. + - Removed CLDepthwiseConvolutionLayer3x3 + - Removed NEDepthwiseConvolutionLayerOptimized + - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16: + - @ref NEWinogradConvolutionLayer + - @ref NEWinogradLayerTransformInputKernel + - @ref NEWinogradLayerTransformOutputKernel + - @ref NEWinogradLayerTransformWeightsKernel + - Added CLCompileContext + - Added Arm® Neon™ GEMM kernel with 2D window support + +v20.02.1 Maintenance release + - Added Android-NN build script. + +v20.02 Public major release + - Various bug fixes. + - Various optimisations. + - Added new data type QASYMM8_SIGNED support for: + - @ref CLDepthwiseConvolutionLayer + - CLDepthwiseConvolutionLayer3x3 + - @ref CLGEMMConvolutionLayer + - @ref CLGEMMLowpMatrixMultiplyCore + - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel + - @ref CLGEMMLowpMatrixMultiplyNativeKernel + - @ref NEActivationLayer + - NEComparisonOperationKernel + - @ref NEConvolutionLayer + - @ref NEDepthwiseConvolutionLayer + - NEDepthwiseConvolutionLayer3x3Kernel + - NEDirectConvolutionLayerOutputStageKernel + - @ref NEElementwiseComparison + - @ref NEElementwiseMax + - @ref NEElementwiseMin + - @ref NEElementwiseSquaredDiff + - @ref NEFullyConnectedLayer + - NEGEMMMatrixVectorMultiplyKernel + - @ref NEPixelWiseMultiplication + - @ref NEPoolingLayer + - @ref NEPReluLayer + - Added support for QSYMM8_PER_CHANNEL in: + - NEDepthwiseConvolutionLayer3x3Kernel + - Added support for split sizes in: + - @ref CLSplit + - @ref NESplit + - New OpenCL kernels / functions: + - @ref CLFill + - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint + - New Arm® Neon™ kernels / functions: + - @ref NEFill + - @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint + - Deprecated Arm® Neon™ functions / interfaces: + - CLDepthwiseConvolutionLayer3x3 + - NEDepthwiseConvolutionLayerOptimized + - PoolingLayerInfo constructors without Data Layout. + - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL. + - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer. + - Added the ability to build bootcode for bare metal. + - Added support for generating synthetic QASYMM8 graphs. + - Added support for F16 datatype in VGG16. + - Removed pre-built binaries for GLES. + +v19.11.1 Public maintenance release + - Fix offset calculation in NEReductionOperationKernel. + - Fix data layout in NEScaleKernel for nhwc. + - Retain configuration step data layout to avoid side-effects. + - Perform sqrt in double domain for L2 pooling. + - Fix output shape calculation for Reduce Mean + - Restrict cases where optimized NEPadLayer runs. + +v19.11 Public major release + - Various bug fixes. + - Various optimisations. + - Updated recommended NDK version to r17c. + - Deprecated OpenCL kernels / functions: + - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel + - CLDepthwiseIm2ColKernel + - CLDepthwiseSeparableConvolutionLayer + - CLDepthwiseVectorToTensorKernel + - CLDirectConvolutionLayerOutputStageKernel + - Deprecated Arm® Neon™ kernels / functions: + - NEDepthwiseWeightsReshapeKernel + - NEDepthwiseIm2ColKernel + - NEDepthwiseSeparableConvolutionLayer + - NEDepthwiseVectorToTensorKernel + - NEDepthwiseConvolutionLayer3x3 + - New OpenCL kernels / functions: + - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer + - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated + OpenCL kernels / functions) + - @ref CLLogSoftmaxLayer + - New Arm® Neon™ kernels / functions: + - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform + - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors + - @ref NEDetectionPostProcessLayer + - @ref NEGenerateProposalsLayer + - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer + - @ref NELogSoftmaxLayer + - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer + - Added QASYMM8 support for: + - @ref CLGenerateProposalsLayer + - @ref CLROIAlignLayer + - @ref CPPBoxWithNonMaximaSuppressionLimit + - Added QASYMM16 support for: + - @ref CLBoundingBoxTransform + - Added FP16 support for: + - @ref CLGEMMMatrixMultiplyReshapedKernel + - Added new data type QASYMM8_PER_CHANNEL support for: + - CLDequantizationLayer + - @ref NEDequantizationLayer + - Added new data type QSYMM8_PER_CHANNEL support for: + - @ref CLConvolutionLayer + - @ref NEConvolutionLayer + - @ref CLDepthwiseConvolutionLayer + - @ref NEDepthwiseConvolutionLayer + - Added FP16 mixed-precision support for: + - @ref CLGEMMMatrixMultiplyReshapedKernel + - CLPoolingLayerKernel + - Added FP32 and FP16 ELU activation for: + - @ref CLActivationLayer + - @ref NEActivationLayer + - Added asymmetric padding support for: + - @ref CLDirectDeconvolutionLayer + - @ref CLGEMMDeconvolutionLayer + - @ref NEDeconvolutionLayer + - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer. + - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer. + - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer. + - Improved performance for CL Inception V3 - FP16. + - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision). + - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer. + - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance. + - Optimized @ref CLPadLayer. + - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel. + - Reduced memory consumption by implementing weights sharing. + +v19.08.1 Public maintenance release + - Fix offset calculation in NEReductionOperationKernel. + - Fix data layout in NEScaleKernel for nhwc. + - Retain configuration step data layout to avoid side-effects. + - Perform sqrt in double domain for L2 pooling. + - Fix output shape calculation for Reduce Mean + - Fix broadcast CLPixelwiseMultiplication with 5D tensors + +v19.08 Public major release + - Various bug fixes. + - Various optimisations. + - Deprecated Arm® Neon™ functions + - NEDepthConcatenateLayer + - NEWidthConcatenateLayer + - Deprecated OpenCL kernels / functions + - CLDepthConcatenateLayer + - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4 + - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW + - CLWidthConcatenateLayer + - New Arm® Neon™ kernels / functions: + - @ref NEAbsLayer + - @ref NECast + - @ref NEElementwisePower + - @ref NELogLayer + - @ref NELSTMLayerQuantized + - @ref NENegLayer + - @ref NEPReluLayer + - @ref NESinLayer + - NEBatchConcatenateLayerKernel + - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer + - NEDepthwiseConvolutionLayerNativeKernel + - @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel + - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer + - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer + - New OpenCL kernels / functions: + - @ref CLAbsLayer + - @ref CLElementwisePower + - @ref CLLogLayer + - @ref CLLSTMLayerQuantized + - @ref CLNegLayer + - @ref CLPReluLayer + - @ref CLSinLayer + - CLBatchConcatenateLayerKernel + - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer + - @ref CLGEMMLowpMatrixMultiplyNativeKernel + - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel + - @ref CLGEMMMatrixMultiplyNativeKernel + - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer + - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer + - New examples: + - neon_opticalflow + - cl_cache + - neon_permute + - Added support for FP16 in @ref NEDeconvolutionLayer + - Added support for FP16 in @ref CLDeconvolutionLayer + - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation + - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only) + - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only) + - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases + - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only) + - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file + - Altered @ref QuantizationInfo interface to support per-channel quantization. + - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations. + - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations. + - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface + - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface + - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel + +v19.05 Public major release + - Various bug fixes. + - Various optimisations. + - New Arm® Neon™ kernels / functions: + - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer + - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication + - @ref NECropKernel / @ref NECropResize + - NEDepthwiseConvolutionAssemblyDispatch + - @ref NEFFTDigitReverseKernel + - @ref NEFFTRadixStageKernel + - @ref NEFFTScaleKernel + - @ref NEGEMMLowpOffsetContributionOutputStageKernel + - NEHeightConcatenateLayerKernel + - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer + - @ref NEFFT1D + - @ref NEFFT2D + - @ref NEFFTConvolutionLayer + - New OpenCL kernels / functions: + - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication + - CLCropKernel / @ref CLCropResize + - @ref CLDeconvolutionReshapeOutputKernel + - @ref CLFFTDigitReverseKernel + - @ref CLFFTRadixStageKernel + - @ref CLFFTScaleKernel + - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel + - @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel + - CLHeightConcatenateLayerKernel + - @ref CLDirectDeconvolutionLayer + - @ref CLFFT1D + - @ref CLFFT2D + - @ref CLFFTConvolutionLayer + - @ref CLGEMMDeconvolutionLayer + - New OpenGLES kernels / functions: + - GCConcatenateLayer + - Deprecated functions/interfaces + - GCDepthConcatenateLayer + - NEWidthConcatenateLayer + - NEDepthConcatenateLayer + - CLWidthConcatenateLayer + - CLDepthConcatenateLayer + - CLGEMMInterleave4x4 + - CLGEMMTranspose1xW + - Support different quantization info in CLConcatLayer. + - Add checks on different input/output quantization info were not supported. + - Tensors have different quantization information. + - Add FP16 support checks. + - Fix output quantization CLDeptwiseConv3x3 when activation is fused. + - New graph examples: + - graph_convolution + - graph_fully_connected + - graph_depthwise_convolution + - Deepspeech v0.4.1 + - Add support for QASYMM8 in NEArithmeticSubtractionKernel. + - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel. + - Add support for QASYMM8 NEDeconvolution. + - Add support for DequantizationLayer for Neon/CL. + - Add support for dilation in CLDepthwiseConvolution. + - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore. + - Optimize CLDeconvolution. + - Add StackLayer to the graph API. + - Add support for "reflect" padding mode in NEPad. + - Winograd 7x7 NHWC on OpenCL. + - Rework CL ML layers to run exclusively on CL. + - Support different quantization info in PoolingLayer. + - Implement and test import memory interfaces. + - Added new tests and removed old ones. + - Various clang-tidy fixes. + +v19.02 Public major release + - Various bug fixes. + - Various optimisations. + - New Arm® Neon™ kernels / functions: + - @ref NETileKernel / @ref NETile + - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization + - NEElementwiseOperationKernel + - @ref NEElementwiseMax + - @ref NEElementwiseMin + - @ref NEElementwiseSquaredDiff + - @ref NESelectKernel / @ref NESelect + - @ref NESplit + - @ref NESlice + - @ref NEUnstack + - @ref NEStridedSliceKernel / @ref NEStridedSlice + - NEElementwiseUnaryKernel + - @ref NERsqrtLayer + - @ref NEExpLayer + - @ref NEReverseKernel / @ref NEReverse + - @ref NEArgMinMaxLayer + - @ref NEStackLayerKernel / @ref NEStackLayer + - @ref NERangeKernel / @ref NERange + - @ref NEPadLayer + - NEMemsetKernel + - @ref NEGatherKernel / @ref NEGather + - @ref NEElementwiseComparison + - @ref NEElementwiseComparisonStatic + - NEComparisonOperationKernel + - @ref NEElementwiseDivision + - New OpenCL kernels / functions: + - @ref CLSelectKernel / @ref CLSelect + - @ref CLTileKernel / @ref CLTile + - @ref CLComparisonKernel / @ref CLComparison + - @ref CLArgMinMaxLayer + - @ref CLElementwiseMax + - @ref CLElementwiseMin + - @ref CLElementwiseSquaredDiff + - @ref CLStackLayerKernel / @ref CLStackLayer + - @ref CLReverse / @ref CLReverseKernel + - @ref CLRsqrtLayer + - @ref CLExpLayer + - CLElementWiseUnaryLayerKernel + - @ref CLGEMMReshapeLHSMatrixKernel + - @ref CLGEMMReshapeRHSMatrixKernel + - @ref CLGEMMMatrixMultiplyReshapedKernel + - @ref CLRangeKernel / @ref CLRange + - @ref CLUnstack + - @ref CLGatherKernel / @ref CLGather + - @ref CLGEMMLowpMatrixMultiplyReshapedKernel + - New CPP kernels / functions: + - @ref CPPDetectionOutputLayer + - @ref CPPTopKV / @ref CPPTopKVKernel + - Added new examples: + - graph_ssd_mobilenet.cpp + - graph_mobilenet_v2.cpp + - graph_resnet12.cpp + - graph_srcnn955.cpp + - graph_vgg_vdsr.cpp + - graph_inception_resnet_v1.cpp + - Add 4D tensors support to + - @ref NESoftmaxLayer + - Fused activation in @ref CLWinogradConvolutionLayer + - Extented @ref NEPermute to support more cases + - Added Neon/SVE GEMM Hybrid kernels + - Added u8 and s8 hybrid assembly kernels + - Introduced GEMM strategy name in NEGEMMAssemblyWrapper + - Improved @ref CLTuner + - Fused the bias addition within @ref CLGEMM + - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer + - Added NHWC data layout support to: + - @ref NEScale for F16 + - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16 + - @ref NEL2NormalizeLayer for FP32/FP16 + - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16 + - @ref CLROIAlignLayer + - @ref CLGenerateProposalsLayer + - Added QASYMM8 support to the following kernels: + - NEArithmeticAdditionKernel + - @ref NEScale + - Added new tests and improved validation and benchmarking suites. + - Deprecated functions/interfaces + - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer + +v18.11 Public major release + - Various bug fixes. + - Various optimisations. + - New Arm® Neon™ kernels / functions: + - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel + - @ref NEReduceMean + - @ref NEReorgLayer / @ref NEReorgLayerKernel + - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel + - NEUpsampleLayer / NEUpsampleLayerKernel + - NEYOLOLayer / NEYOLOLayerKernel + - New OpenCL kernels / functions: + - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel + - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel + - @ref CLComputeAllAnchorsKernel + - @ref CLGenerateProposalsLayer + - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel + - @ref CLReorgLayer / @ref CLReorgLayerKernel + - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel + - @ref CLPadLayer + - @ref CLReduceMean + - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel + - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel + - @ref CLSlice + - @ref CLSplit + - @ref CLStridedSlice / @ref CLStridedSliceKernel + - CLUpsampleLayer / CLUpsampleLayerKernel + - CLYOLOLayer / CLYOLOLayerKernel + - New CPP kernels / functions: + - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel + - Added the validate method in: + - @ref NEDepthConvertLayer + - @ref NEFloor / @ref CLFloor + - @ref NEGEMMMatrixAdditionKernel + - @ref NEReshapeLayer / @ref CLReshapeLayer + - @ref CLScale + - Added new examples: + - graph_shufflenet.cpp + - graph_yolov3.cpp + - Added documentation for add a new function or kernel. + - Improved doxygen documentation adding a list of the existing functions. + - Add 4D tensors support to + - CLWidthConcatenateLayer + - CLFlattenLayer + - @ref CLSoftmaxLayer + - Add dot product support for @ref CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride + - Add SVE support + - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization + - Fuses activation in @ref CLDepthwiseConvolutionLayer3x3NCHWKernel, @ref CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer + - Added NHWC data layout support to: + - @ref CLChannelShuffleLayer + - @ref CLDeconvolutionLayer + - @ref CLL2NormalizeLayer + - Added QASYMM8 support to the following kernels: + - CLScaleKernel + - NEDepthwiseConvolutionLayer3x3Kernel + - CLPixelWiseMultiplicationKernel + - Added FP16 support to the following kernels: + - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel + - NEDepthwiseConvolutionLayer3x3Kernel + - @ref CLNormalizePlanarYUVLayerKernel + - @ref CLWinogradConvolutionLayer (5x5 kernel) + - More tests added to both validation and benchmarking suites. + +v18.08 Public major release + - Various bug fixes. + - Various optimisations. + - Updated recommended NDK version to r17b. + - Removed support for QS8/QS16 data types. + - Added support for grouped convolution in @ref CLConvolutionLayer. + - Added NHWC data layout support to: + - NEDepthConcatenateLayer / CLDepthConcatenateLayer + - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer + - @ref CLDepthwiseConvolutionLayer + - @ref CLDirectConvolutionLayer + - @ref CLConvolutionLayer + - @ref CLScale + - @ref CLIm2ColKernel + - New Arm® Neon™ kernels / functions: + - @ref NERNNLayer + - New OpenCL kernels / functions: + - @ref CLArithmeticDivision + - Introduced prepare() stage support in the graph API for GLES. + - Added support for memory reusage when trying to allocate smaller CLTensors. + - Enabled NHWC execution on graph examples. + - Added JPEG accessor for validation purposes. + - Added validate methods to some kernels / functions. + +v18.05 Public major release + - Various bug fixes. + - Various optimisations. + - Major redesign in the interface for the neon kernels implemented in assembly. + - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel + - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions. + - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface. + - Moved neon assembly kernels to the folder src/core/Neon/kernels/arm_gemm. + - Improved doxygen documentation. + - Improved memory management for layer's transitions. + - Added support for NHWC data layout in tensors. + - Added NHWC data layout support to: + - @ref NEGEMMConvolutionLayer + - @ref NEDirectConvolutionLayer + - @ref NEPoolingLayer / @ref CLPoolingLayer + - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer + - @ref NEDepthwiseConvolutionLayer + - @ref NEScale + - NEIm2Col + - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer. + - New OpenCL kernels / functions: + - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel + - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights + - @ref CLCopy / CLCopyKernel + - @ref CLLSTMLayer + - @ref CLRNNLayer + - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel + - @ref CLWinogradFilterTransformKernel / @ref CLWinogradInputTransformKernel / @ref CLWinogradConvolutionLayer + - @ref CLWinogradInputTransformKernel / @ref CLWinogradInputTransform + - New Arm® Neon™ kernels / functions: + - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights. + - Created the validate method in @ref CLDepthwiseConvolutionLayer. + - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer. + - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer. + - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel. + - Port mobilenet example to NHWC data layout. + - Enabled Winograd method in @ref CLConvolutionLayer. + - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer. + - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm. + - Added memory manager support in GLES functions. + - Major refactoring of the graph API. + - Added GLES backend in the graph API. + - Added support for the memory manager in the graph API. + - Enabled Winograd Convolution method in the graph API. + - Added support for grouped convolutions in the graph API. + - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer. + - Added fast maths flag in @ref CLConvolutionLayer. + - Added new tests and benchmarks in validation and benchmark frameworks + - Merge Activation layer with Convolution Layer (Neon. CL, GLES) + - Added support to OpenCL 2.0 SVM + - Added support to import memory in OpenCL tensors. + - Added the prepare() method to perform any one off pre-processing before running the function. + - Added new examples: + - graph_inception_v4.cpp + - graph_resnext50.cpp + - Added memory measurement instrument for CL. + +v18.03 Public maintenance release + - Various bug fixes. + - Fixed bug in @ref NEActivationLayer + - Fix in @ref CLTuner when using batches. + - Updated recommended NDK version to r16b (And fixed warnings). + - Fixed bug in validation code. + - Added Inception v4 graph example. + - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer + +v18.02 Public major release + - Various Arm® Neon™ / OpenCL / GLES optimisations. + - Various bug fixes. + - Changed default number of threads on big LITTLE systems. + - Refactored examples and added: + - graph_mobilenet_qassym8 + - graph_resnet + - graph_squeezenet_v1_1 + - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method. + - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method. + - Added in place support to: + - @ref CLActivationLayer + - @ref CLBatchNormalizationLayer + - Added QASYMM8 support to: + - @ref CLActivationLayer + - @ref CLDepthwiseConvolutionLayer + - @ref NEDepthwiseConvolutionLayer + - @ref NESoftmaxLayer + - Added FP16 support to: + - CLDepthwiseConvolutionLayer3x3 + - @ref CLDepthwiseConvolutionLayer + - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication + - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer + - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer + - New OpenCL kernels / functions: + - CLDirectConvolutionLayerOutputStageKernel + - New Arm® Neon™ kernels / functions + - Added name() method to all kernels. + - Added support for Winograd 5x5. + - NEPermuteKernel / @ref NEPermute + - @ref NEWinogradLayerTransformInputKernel / NEWinogradLayer + - @ref NEWinogradLayerTransformOutputKernel / NEWinogradLayer + - @ref NEWinogradLayerTransformWeightsKernel / NEWinogradLayer + - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel + - New GLES kernels / functions: + - GCTensorShiftKernel / GCTensorShift + +v18.01 Public maintenance release + - Various bug fixes + - Added some of the missing validate() methods + - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample + - Added CLPermuteKernel / @ref CLPermute + - Added method to clean the programs cache in the CL Kernel library. + - Added GCArithmeticAdditionKernel / GCArithmeticAddition + - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3 + - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer + - Added GCScaleKernel / GCScale + - Added GCWeightsReshapeKernel / GCConvolutionLayer + - Added FP16 support to the following GLES compute kernels: + - GCCol2ImKernel + - GCGEMMInterleave4x4Kernel + - GCGEMMTranspose1xWKernel + - GCIm2ColKernel + - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel) + - Added NEDirectConvolutionLayerOutputStageKernel + - Added QASYMM8 support to the following Arm® Neon™ kernels: + - NEDepthwiseConvolutionLayer3x3Kernel + - @ref NEFillBorderKernel + - NEPoolingLayerKernel + - Added new examples: + - graph_cl_mobilenet_qasymm8.cpp + - graph_inception_v3.cpp + - gc_dc.cpp + - More tests added to both validation and benchmarking suites. + +v17.12 Public major release + - Most machine learning functions on OpenCL support the new data type QASYMM8 + - Introduced logging interface + - Introduced opencl timer + - Reworked GEMMLowp interface + - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM + - Added validation method for most Machine Learning kernels / functions + - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19 + - Added sgemm example for OpenCL + - Added absolute difference example for GLES compute + - Added new tests and benchmarks in validation and benchmark frameworks + - Added new kernels / functions for GLES compute + + - New OpenGL ES kernels / functions + - GCAbsoluteDifferenceKernel / GCAbsoluteDifference + - GCActivationLayerKernel / GCActivationLayer + - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer + - GCCol2ImKernel + - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer + - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer + - GCDropoutLayerKernel / GCDropoutLayer + - GCFillBorderKernel / GCFillBorder + - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4 + - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM + - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW + - GCIm2ColKernel + - GCNormalizationLayerKernel / GCNormalizationLayer + - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication + - GCPoolingLayerKernel / GCPoolingLayer + - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer + - GCTransposeKernel / GCTranspose + + - New Arm® Neon™ kernels / functions + - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore + - arm_compute::NEHGEMMAArch64FP16Kernel + - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer + - @ref NEGEMMLowpOffsetContributionKernel / @ref NEGEMMLowpMatrixAReductionKernel / @ref NEGEMMLowpMatrixBReductionKernel / @ref NEGEMMLowpMatrixMultiplyCore + - @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint + - NEWinogradLayer / NEWinogradLayerKernel + + - New OpenCL kernels / functions + - @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore + - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint + + - New graph nodes for Arm® Neon™ and OpenCL + - graph::BranchLayer + - graph::DepthConvertLayer + - graph::DepthwiseConvolutionLayer + - graph::DequantizationLayer + - graph::FlattenLayer + - graph::QuantizationLayer + - graph::ReshapeLayer + +v17.10 Public maintenance release + - Bug fixes: + - Check the maximum local workgroup size supported by OpenCL devices + - Minor documentation updates (Fixed instructions to build the examples) + - Introduced a graph::GraphContext + - Added a few new Graph nodes, support for branches and grouping. + - Automatically enable cl_printf in debug builds + - Fixed bare metal builds for armv7a + - Added AlexNet and cartoon effect examples + - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute) + +v17.09 Public major release + - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers. + - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager) + - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework). + - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL. + - New Arm® Neon™ kernels / functions: + - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel + - NEDequantizationLayerKernel / @ref NEDequantizationLayer + - NEFloorKernel / @ref NEFloor + - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer + - NEQuantizationLayerKernel @ref NEMinMaxLayerKernel / @ref NEQuantizationLayer + - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer + - @ref NEReductionOperationKernel / @ref NEReductionOperation + - NEReshapeLayerKernel / @ref NEReshapeLayer + + - New OpenCL kernels / functions: + - @ref CLDepthwiseConvolutionLayer3x3NCHWKernel @ref CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer + - CLDequantizationLayerKernel / CLDequantizationLayer + - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer + - CLFlattenLayer + - CLFloorKernel / @ref CLFloor + - CLGEMMTranspose1xW + - CLGEMMMatrixVectorMultiplyKernel + - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer + - CLQuantizationLayerKernel @ref CLMinMaxLayerKernel / @ref CLQuantizationLayer + - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer + - @ref CLReductionOperationKernel / @ref CLReductionOperation + - CLReshapeLayerKernel / @ref CLReshapeLayer + +v17.06 Public major release + - Various bug fixes + - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels. + - Added unit tests and benchmarks (AlexNet, LeNet) + - Added support for sub tensors. + - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels. + - Added @ref OMPScheduler (OpenMP) scheduler for Neon + - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal) + - User can specify his own scheduler by implementing the @ref IScheduler interface. + - New OpenCL kernels / functions: + - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer + - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer + - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection + - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer + - @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights + - New C++ kernels: + - CPPDetectionWindowNonMaximaSuppressionKernel + - New Arm® Neon™ kernels / functions: + - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer + - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer + - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer + - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer + - @ref NEWeightsReshapeKernel / @ref NEConvolutionLayerReshapeWeights + +v17.05 Public bug fixes release + - Various bug fixes + - Remaining of the functions ported to use accurate padding. + - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available). + - Added "free" method to allocator. + - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9 + +v17.04 Public bug fixes release + + The following functions have been ported to use the new accurate padding: + - CLColorConvertKernel + - CLEdgeNonMaxSuppressionKernel + - CLEdgeTraceKernel + - CLGaussianPyramidHorKernel + - CLGaussianPyramidVertKernel + - CLGradientKernel + - NEChannelCombineKernel + - NEFillArrayKernel + - NEGaussianPyramidHorKernel + - NEGaussianPyramidVertKernel + - NEHarrisScoreFP16Kernel + - NEHarrisScoreKernel + - NEHOGDetectorKernel + - NELogits1DMaxKernel + - NELogits1DShiftExpSumKernel + - NELogits1DNormKernel + - NENonMaximaSuppression3x3FP16Kernel + - NENonMaximaSuppression3x3Kernel + +v17.03.1 First Major public release of the sources + - Renamed the library to arm_compute + - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions. + - New padding calculation interface introduced and ported most kernels / functions to use it. + - New OpenCL kernels / functions: + - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp + - New Arm® Neon™ kernels / functions: + - @ref NENormalizationLayerKernel / @ref NENormalizationLayer + - NETransposeKernel / @ref NETranspose + - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer + - @ref NEIm2ColKernel, @ref NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer + - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer + - @ref NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp + +v17.03 Sources preview + - New OpenCL kernels / functions: + - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge + - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, @ref CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM + - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer + - CLTransposeKernel / @ref CLTranspose + - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow + - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer + - CLLaplacianPyramid, CLLaplacianReconstruct + - New Arm® Neon™ kernels / functions: + - NEActivationLayerKernel / @ref NEActivationLayer + - GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM + - NEPoolingLayerKernel / @ref NEPoolingLayer + +v17.02.1 Sources preview + - New OpenCL kernels / functions: + - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer + - CLPoolingLayerKernel / @ref CLPoolingLayer + - @ref CLIm2ColKernel, @ref CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer + - @ref CLRemapKernel / @ref CLRemap + - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb + - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation + - CLNonLinearFilterKernel / CLNonLinearFilter + - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU) + - NEAccumulateWeightedFP16Kernel + - NEBox3x3FP16Kernel + - NENonMaximaSuppression3x3FP16Kernel + +v17.02 Sources preview + - New OpenCL kernels / functions: + - CLActivationLayerKernel / @ref CLActivationLayer + - CLChannelCombineKernel / CLChannelCombine + - CLDerivativeKernel / CLChannelExtract + - CLFastCornersKernel / CLFastCorners + - CLMeanStdDevKernel / CLMeanStdDev + - New Arm® Neon™ kernels / functions: + - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection + - NENonLinearFilterKernel / NENonLinearFilter + - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events. + - Switched all the kernels / functions to use tensors instead of images. + - Updated documentation to include instructions to build the library from sources. + +v16.12 Binary preview release + - Original release + + */ +} // namespace arm_compute
\ No newline at end of file |