aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMichele Di Giorgio <michele.digiorgio@arm.com>2021-03-09 14:09:08 +0000
committerMichele Di Giorgio <michele.digiorgio@arm.com>2021-03-31 17:08:51 +0000
commit33f41fabd30fb444aaa0cf3e65b61794d498d151 (patch)
treea381cff3096a3b05198b0cd311fee28e40fd5a4f /docs
parent5f91b5d7063462854b62d342f9d4e04ae647e9a6 (diff)
downloadComputeLibrary-33f41fabd30fb444aaa0cf3e65b61794d498d151.tar.gz
Fix trademarks throughout the codebase
Resolves: COMPMID-4299 Change-Id: Ie6a52c1371b9a2a7b5bb4f019ecd5e70a2008567 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5338 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
Diffstat (limited to 'docs')
-rw-r--r--docs/00_introduction.dox126
-rw-r--r--docs/01_library.dox20
-rw-r--r--docs/02_tests.dox4
-rw-r--r--docs/04_adding_operator.dox12
-rw-r--r--docs/06_functions_list.dox2
-rw-r--r--docs/07_errata.dox4
-rw-r--r--docs/ComputeLibrary.dir30
7 files changed, 99 insertions, 99 deletions
diff --git a/docs/00_introduction.dox b/docs/00_introduction.dox
index e199ee9d6f..112254e82a 100644
--- a/docs/00_introduction.dox
+++ b/docs/00_introduction.dox
@@ -32,7 +32,7 @@ The Compute Library is a collection of low-level machine learning functions opti
Several builds of the library are available using various configurations:
- OS: Linux, Android, macOS or bare metal.
- Architecture: armv7a (32bit) or arm64-v8a (64bit).
- - Technology: Neon / OpenCL / Neon and OpenCL.
+ - Technology: Arm® Neon™ / OpenCL / Arm® Neon™ and OpenCL.
- Debug / Asserts / Release: Use a build with asserts enabled to debug your application and enable extra validation. Once you are sure your application works as expected you can switch to a release build of the library for maximum performance.
@section S0_1_contact Contact / Support
@@ -86,7 +86,7 @@ If there is more than one release in a month then an extra sequential number is
@subsection S2_2_changelog Changelog
v21.05 Public major release
- - Removed computer vision support from Neon backend
+ - Removed computer vision support from Arm® Neon™ backend
- Removed the following functions:
- NEAbsoluteDifference
- NEAccumulate
@@ -225,7 +225,7 @@ v21.02 Public major release
v20.11 Public major release
- Various bug fixes.
- Various optimisations.
- - Performance regressions can be noted when executing Depthwise Convolution on Neon with a depth multiplier > 1 for quantized data type.
+ - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type.
This is planned to be resolved in 21.02 release.
- Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer.
- Added new data type S32 support for:
@@ -250,11 +250,11 @@ v20.11 Public major release
- @ref CLLogicalNot
- @ref CLLogicalAnd
- @ref CLLogicalOr
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NELogicalNot
- @ref NELogicalAnd
- @ref NELogicalOr
- - Removed padding from Neon kernels:
+ - Removed padding from Arm® Neon™ kernels:
- NEComplexPixelWiseMultiplicationKernel
- NENonMaximaSuppression3x3Kernel
- @ref NERemapKernel
@@ -404,7 +404,7 @@ v20.11 Public major release
- CLWarpAffineKernel
- CLWarpPerspective
- CLWarpPerspectiveKernel
- - Deprecated Neon kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
- NELocallyConnectedLayer
- NELocallyConnectedMatrixMultiplyKernel
- NEAbsoluteDifference
@@ -538,7 +538,7 @@ v20.08 Public major release
- CLScaleKernel
- New OpenCL kernels / functions:
- @ref CLMaxUnpoolingLayerKernel
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEMaxUnpoolingLayerKernel
- New graph example:
- graph_yolov3_output_detector
@@ -574,7 +574,7 @@ v20.08 Public major release
- Removed OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- - Removed Neon kernels / functions:
+ - Removed Arm® Neon™ kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- NEGEMMMatrixAccumulateBiasesKernel
- Deprecated functions / interfaces:
@@ -589,7 +589,7 @@ v20.08 Public major release
- Removed padding requirement for the input (e.g. LHS of GEMM) and output in @ref CLGEMMMatrixMultiplyNativeKernel, @ref CLGEMMMatrixMultiplyReshapedKernel, @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and @ref CLIm2ColKernel (NHWC only)
- This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output.
- Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation.
- - Only on Arm Mali Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since @ref CLGEMMMatrixMultiplyKernel is called and currently requires padding.
+ - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since @ref CLGEMMMatrixMultiplyKernel is called and currently requires padding.
- Added support for exporting the OpenCL buffer object to the OpenCL image object in @ref CLGEMMMatrixMultiplyReshapedKernel and @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
- This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
- The padding requirement for the OpenCL image object is considered into the @ref CLGEMMReshapeRHSMatrixKernel.
@@ -640,7 +640,7 @@ v20.05 Public major release
- New OpenCL kernels / functions:
- @ref CLQLSTMLayer
- @ref CLQLSTMLayerNormalizationKernel
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEQLSTMLayer
- @ref NEQLSTMLayerNormalizationKernel
- Added HARD_SWISH support in:
@@ -649,20 +649,20 @@ v20.05 Public major release
- Deprecated OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- - Deprecated Neon kernels / functions:
+ - Deprecated Arm® Neon™ kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- Removed CPP kernels / functions:
- CPPFlipWeightsKernel
- Removed PoolingLayerInfo constructors without Data Layout.
- Removed CLDepthwiseConvolutionLayer3x3
- Removed NEDepthwiseConvolutionLayerOptimized
- - Added support for Winograd 3x3,4x4 on Neon FP16:
+ - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
- @ref NEWinogradConvolutionLayer
- @ref NEWinogradLayerTransformInputKernel
- @ref NEWinogradLayerTransformOutputKernel
- @ref NEWinogradLayerTransformWeightsKernel
- Added CLCompileContext
- - Added Neon GEMM kernel with 2D window support
+ - Added Arm® Neon™ GEMM kernel with 2D window support
v20.02.1 Maintenance release
- Added Android-NN build script.
@@ -700,14 +700,14 @@ v20.02 Public major release
- New OpenCL kernels / functions:
- @ref CLFill
- CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEFill
- @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- - Deprecated Neon functions / interfaces:
+ - Deprecated Arm® Neon™ functions / interfaces:
- CLDepthwiseConvolutionLayer3x3
- NEDepthwiseConvolutionLayerOptimized
- PoolingLayerInfo constructors without Data Layout.
- - Added support for quantization with multiplier greater than 1 on Neon and CL.
+ - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL.
- Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
- Added the ability to build bootcode for bare metal.
- Added support for generating synthetic QASYMM8 graphs.
@@ -732,7 +732,7 @@ v19.11 Public major release
- CLDepthwiseSeparableConvolutionLayer
- CLDepthwiseVectorToTensorKernel
- CLDirectConvolutionLayerOutputStageKernel
- - Deprecated Neon kernels / functions:
+ - Deprecated Arm® Neon™ kernels / functions:
- NEDepthwiseWeightsReshapeKernel
- NEDepthwiseIm2ColKernel
- NEDepthwiseSeparableConvolutionLayer
@@ -743,7 +743,7 @@ v19.11 Public major release
- @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
OpenCL kernels / functions)
- @ref CLLogSoftmaxLayer
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
- @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
- @ref NEDetectionPostProcessLayer
@@ -782,8 +782,8 @@ v19.11 Public major release
- Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
- Improved performance for CL Inception V3 - FP16.
- Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
- - Improved Neon performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
- - Improved Neon performance for MobileNet-SSD by improving the output detection performance.
+ - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
+ - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance.
- Optimized @ref CLPadLayer.
- Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
- Reduced memory consumption by implementing weights sharing.
@@ -799,7 +799,7 @@ v19.08.1 Public maintenance release
v19.08 Public major release
- Various bug fixes.
- Various optimisations.
- - Deprecated Neon functions
+ - Deprecated Arm® Neon™ functions
- NEDepthConcatenateLayer
- NEWidthConcatenateLayer
- Deprecated OpenCL kernels / functions
@@ -807,7 +807,7 @@ v19.08 Public major release
- CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
- CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
- CLWidthConcatenateLayer
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEAbsLayer
- @ref NECast
- @ref NEElementwisePower
@@ -846,7 +846,7 @@ v19.08 Public major release
- Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
- Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
- Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
- - Re-factored the depthwise convolution layer kernel on Neon for generic cases
+ - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases
- Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only)
- Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
- Altered @ref QuantizationInfo interface to support per-channel quantization.
@@ -854,12 +854,12 @@ v19.08 Public major release
- The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
- Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
- Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
- - Optimized the Neon assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
+ - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
v19.05 Public major release
- Various bug fixes.
- Various optimisations.
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer
- NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication
- @ref NECropKernel / @ref NECropResize
@@ -927,7 +927,7 @@ v19.05 Public major release
v19.02 Public major release
- Various bug fixes.
- Various optimisations.
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NETileKernel / @ref NETile
- @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization
- NEElementwiseOperationKernel
@@ -1010,7 +1010,7 @@ v19.02 Public major release
v18.11 Public major release
- Various bug fixes.
- Various optimisations.
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel
- @ref NEReduceMean
- @ref NEReorgLayer / @ref NEReorgLayerKernel
@@ -1084,7 +1084,7 @@ v18.08 Public major release
- @ref CLConvolutionLayer
- @ref CLScale
- @ref CLIm2ColKernel
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NERNNLayer
- New OpenCL kernels / functions:
- @ref CLArithmeticDivision
@@ -1123,7 +1123,7 @@ v18.05 Public major release
- CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel
- @ref CLWinogradFilterTransformKernel / @ref CLWinogradInputTransformKernel / @ref CLWinogradConvolutionLayer
- @ref CLWinogradInputTransformKernel / @ref CLWinogradInputTransform
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights.
- Created the validate method in @ref CLDepthwiseConvolutionLayer.
- Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer.
@@ -1161,7 +1161,7 @@ v18.03 Public maintenance release
- Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
v18.02 Public major release
- - Various Neon / OpenCL / GLES optimisations.
+ - Various Arm® Neon™ / OpenCL / GLES optimisations.
- Various bug fixes.
- Changed default number of threads on big LITTLE systems.
- Refactored examples and added:
@@ -1186,7 +1186,7 @@ v18.02 Public major release
- Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
- New OpenCL kernels / functions:
- CLDirectConvolutionLayerOutputStageKernel
- - New Neon kernels / functions
+ - New Arm® Neon™ kernels / functions
- Added name() method to all kernels.
- Added support for Winograd 5x5.
- NEPermuteKernel / @ref NEPermute
@@ -1213,9 +1213,9 @@ v18.01 Public maintenance release
- GCGEMMInterleave4x4Kernel
- GCGEMMTranspose1xWKernel
- GCIm2ColKernel
- - Refactored Neon Winograd (NEWinogradLayerKernel)
+ - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
- Added @ref NEDirectConvolutionLayerOutputStageKernel
- - Added QASYMM8 support to the following Neon kernels:
+ - Added QASYMM8 support to the following Arm® Neon™ kernels:
- NEDepthwiseConvolutionLayer3x3Kernel
- @ref NEFillBorderKernel
- NEPoolingLayerKernel
@@ -1230,7 +1230,7 @@ v17.12 Public major release
- Introduced logging interface
- Introduced opencl timer
- Reworked GEMMLowp interface
- - Added new Neon assembly kernels for GEMMLowp, SGEMM and HGEMM
+ - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
- Added validation method for most Machine Learning kernels / functions
- Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
- Added sgemm example for OpenCL
@@ -1257,7 +1257,7 @@ v17.12 Public major release
- GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
- GCTransposeKernel / GCTranspose
- - New Neon kernels / functions
+ - New Arm® Neon™ kernels / functions
- arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
- arm_compute::NEHGEMMAArch64FP16Kernel
- NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
@@ -1269,7 +1269,7 @@ v17.12 Public major release
- @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
- - New graph nodes for Neon and OpenCL
+ - New graph nodes for Arm® Neon™ and OpenCL
- graph::BranchLayer
- graph::DepthConvertLayer
- graph::DepthwiseConvolutionLayer
@@ -1293,8 +1293,8 @@ v17.09 Public major release
- Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
- Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
- New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
- - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Neon and OpenCL.
- - New Neon kernels / functions:
+ - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL.
+ - New Arm® Neon™ kernels / functions:
- arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
- NEDequantizationLayerKernel / @ref NEDequantizationLayer
- NEFloorKernel / @ref NEFloor
@@ -1320,12 +1320,12 @@ v17.09 Public major release
v17.06 Public major release
- Various bug fixes
- - Added support for fixed point 8 bit (QS8) to the various Neon machine learning kernels.
+ - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
- Added unit tests and benchmarks (AlexNet, LeNet)
- Added support for sub tensors.
- Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
- Added @ref OMPScheduler (OpenMP) scheduler for Neon
- - Added @ref SingleThreadScheduler scheduler for Neon (For bare metal)
+ - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
- User can specify his own scheduler by implementing the @ref IScheduler interface.
- New OpenCL kernels / functions:
- @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
@@ -1335,7 +1335,7 @@ v17.06 Public major release
- @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights
- New C++ kernels:
- CPPDetectionWindowNonMaximaSuppressionKernel
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
- NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
- @ref NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
@@ -1373,11 +1373,11 @@ v17.04 Public bug fixes release
v17.03.1 First Major public release of the sources
- Renamed the library to arm_compute
- - New CPP target introduced for C++ kernels shared between Neon and CL functions.
+ - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
- New padding calculation interface introduced and ported most kernels / functions to use it.
- New OpenCL kernels / functions:
- CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- @ref NENormalizationLayerKernel / @ref NENormalizationLayer
- NETransposeKernel / @ref NETranspose
- NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
@@ -1394,7 +1394,7 @@ v17.03 Sources preview
- CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow
- @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
- CLLaplacianPyramid, CLLaplacianReconstruct
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- NEActivationLayerKernel / @ref NEActivationLayer
- GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM
- NEPoolingLayerKernel / @ref NEPoolingLayer
@@ -1408,7 +1408,7 @@ v17.02.1 Sources preview
- CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
- CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
- CLNonLinearFilterKernel / CLNonLinearFilter
- - New Neon FP16 kernels (Requires armv8.2 CPU)
+ - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
- NEAccumulateWeightedFP16Kernel
- NEBox3x3FP16Kernel
- NENonMaximaSuppression3x3FP16Kernel
@@ -1420,7 +1420,7 @@ v17.02 Sources preview
- CLDerivativeKernel / CLChannelExtract
- CLFastCornersKernel / CLFastCorners
- CLMeanStdDevKernel / CLMeanStdDev
- - New Neon kernels / functions:
+ - New Arm® Neon™ kernels / functions:
- HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
- NENonLinearFilterKernel / NENonLinearFilter
- Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
@@ -1473,7 +1473,7 @@ To see the build options available simply run ```scons -h```:
opencl: Enable OpenCL support (yes|no)
default: True
- neon: Enable Neon support (yes|no)
+ neon: Enable Arm® Neon™ support (yes|no)
default: False
embed_kernels: Embed OpenCL kernels in library binary (yes|no)
@@ -1555,7 +1555,7 @@ To see the build options available simply run ```scons -h```:
pmu: Enable PMU counters (yes|no)
default: False
- mali: Enable Mali hardware counters (yes|no)
+ mali: Enable Arm® Mali™ hardware counters (yes|no)
default: False
external_tests_dir: Add examples, benchmarks and tests to the tests suite from an external path ( /path/to/external_tests_dir )
@@ -1569,7 +1569,7 @@ To see the build options available simply run ```scons -h```:
@b arch: The x86_32 and x86_64 targets can only be used with neon=0 and opencl=1.
@b os: Choose the operating system you are targeting: Linux, Android or bare metal.
-@note bare metal can only be used for Neon (not OpenCL), only static libraries get built and Neon's multi-threading support is disabled.
+@note bare metal can only be used for Arm® Neon™ (not OpenCL), only static libraries get built and Neon's multi-threading support is disabled.
@b build: you can either build directly on your device (native) or cross compile from your desktop machine (cross-compile). In both cases make sure the compiler is available in your path.
@@ -1581,7 +1581,7 @@ In addittion the option 'compress_kernels' will compress the embedded OpenCL ker
@b Werror: If you are compiling using the same toolchains as the ones used in this guide then there shouldn't be any warning and therefore you should be able to keep Werror=1. If with a different compiler version the library fails to build because of warnings interpreted as errors then, if you are sure the warnings are not important, you might want to try to build with Werror=0 (But please do report the issue on Github).
-@b opencl / @b neon: Choose which SIMD technology you want to target. (Neon for Arm Cortex-A CPUs or OpenCL for Arm Mali GPUs)
+@b opencl / @b neon: Choose which SIMD technology you want to target. (Neon for Arm Cortex-A CPUs or OpenCL for Arm® Mali™ GPUs)
@b embed_kernels: For OpenCL only: set embed_kernels=1 if you want the OpenCL kernels to be built in the library's binaries instead of being read from separate ".cl" / ".cs" files. If embed_kernels is set to 0 then the application can set the path to the folder containing the OpenCL kernel files by calling CLKernelLibrary::init(). By default the path is set to "./cl_kernels".
@@ -1607,7 +1607,7 @@ Example:
@b pmu: Enable the PMU cycle counter to measure execution time in benchmark tests. (Your device needs to support it)
-@b mali: Enable the collection of Mali hardware counters to measure execution time in benchmark tests. (Your device needs to have a Mali driver that supports it)
+@b mali: Enable the collection of Arm® Mali™ hardware counters to measure execution time in benchmark tests. (Your device needs to have a Arm® Mali™ driver that supports it)
@b openmp Build in the OpenMP scheduler for Neon.
@@ -1645,7 +1645,7 @@ For Linux, the library was successfully built and tested using the following Lin
- gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf
- gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu
-To cross-compile the library in debug mode, with Neon only support, for Linux 32bit:
+To cross-compile the library in debug mode, with Arm® Neon™ only support, for Linux 32bit:
scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=linux arch=armv7a
@@ -1678,11 +1678,11 @@ The examples get automatically built by scons as part of the build process of th
@note The following command lines assume the arm_compute libraries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built libraries with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.
-To cross compile a Neon example for Linux 32bit:
+To cross compile a Arm® Neon™ example for Linux 32bit:
arm-linux-gnueabihf-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute -larm_compute_core -o neon_convolution
-To cross compile a Neon example for Linux 64bit:
+To cross compile a Arm® Neon™ example for Linux 64bit:
aarch64-linux-gnu-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -L. -larm_compute -larm_compute_core -o neon_convolution
@@ -1712,11 +1712,11 @@ i.e. to cross compile the "graph_lenet" example for Linux 64bit:
@note If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, arm_compute, arm_compute_core
-To compile natively (i.e directly on an Arm device) for Neon for Linux 32bit:
+To compile natively (i.e directly on an Arm device) for Arm® Neon™ for Linux 32bit:
g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -larm_compute -larm_compute_core -o neon_convolution
-To compile natively (i.e directly on an Arm device) for Neon for Linux 64bit:
+To compile natively (i.e directly on an Arm device) for Arm® Neon™ for Linux 64bit:
g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o neon_convolution
@@ -1798,7 +1798,7 @@ For NDK r19 or newer, you can directly <a href="https://developer.android.com/nd
@subsubsection S3_3_1_library How to build the library ?
-To cross-compile the library in debug mode, with Neon only support, for Android 32bit:
+To cross-compile the library in debug mode, with Arm® Neon™ only support, for Android 32bit:
CXX=clang++ CC=clang scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=android arch=armv7a
@@ -1814,7 +1814,7 @@ The examples get automatically built by scons as part of the build process of th
Once you've got your Android standalone toolchain built and added to your path you can do the following:
-To cross compile a Neon example:
+To cross compile a Arm® Neon™ example:
#32 bit:
arm-linux-androideabi-clang++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o neon_convolution_arm -static-libstdc++ -pie
@@ -1835,7 +1835,7 @@ To cross compile the examples with the Graph API, such as graph_lenet.cpp, you n
#64 bit:
aarch64-linux-android-clang++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -Wl,--whole-archive -larm_compute_graph-static -Wl,--no-whole-archive -larm_compute-static -larm_compute_core-static -L. -o graph_lenet_aarch64 -static-libstdc++ -pie -DARM_COMPUTE_CL
-@note Due to some issues in older versions of the Mali OpenCL DDK (<= r13p0), we recommend to link arm_compute statically on Android.
+@note Due to some issues in older versions of the Arm® Mali™ OpenCL DDK (<= r13p0), we recommend to link arm_compute statically on Android.
@note When linked statically the arm_compute_graph library currently needs the --whole-archive linker flag in order to work properly
Then you need to do is upload the executable and the shared library to the device using ADB:
@@ -1893,7 +1893,7 @@ Download linaro for <a href="https://releases.linaro.org/components/toolchain/bi
@subsubsection S3_5_1_library How to build the library ?
-To cross-compile the library with Neon support for baremetal arm64-v8a:
+To cross-compile the library with Arm® Neon™ support for baremetal arm64-v8a:
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=bare_metal arch=arm64-v8a build=cross_compile cppthreads=0 openmp=0 standalone=1
@@ -1933,13 +1933,13 @@ can be followed.
@subsubsection S3_7_1_cl_hard_requirements Hard Requirements
-Compute Library requires OpenCL 1.1 and above with support of non uniform workgroup sizes, which is officially supported in the Mali OpenCL DDK r8p0 and above as an extension (respective extension flag is \a -cl-arm-non-uniform-work-group-size).
+Compute Library requires OpenCL 1.1 and above with support of non uniform workgroup sizes, which is officially supported in the Arm® Mali™ OpenCL DDK r8p0 and above as an extension (respective extension flag is \a -cl-arm-non-uniform-work-group-size).
-Enabling 16-bit floating point calculations require \a cl_khr_fp16 extension to be supported. All Mali GPUs with compute capabilities have native support for half precision floating points.
+Enabling 16-bit floating point calculations require \a cl_khr_fp16 extension to be supported. All Arm® Mali™ GPUs with compute capabilities have native support for half precision floating points.
@subsubsection S3_7_2_cl_performance_requirements Performance improvements
-Integer dot product built-in function extensions (and therefore optimized kernels) are available with Mali OpenCL DDK r22p0 and above for the following GPUs : G71, G76. The relevant extensions are \a cl_arm_integer_dot_product_int8, \a cl_arm_integer_dot_product_accumulate_int8 and \a cl_arm_integer_dot_product_accumulate_int16.
+Integer dot product built-in function extensions (and therefore optimized kernels) are available with Arm® Mali™ OpenCL DDK r22p0 and above for the following GPUs : G71, G76. The relevant extensions are \a cl_arm_integer_dot_product_int8, \a cl_arm_integer_dot_product_accumulate_int8 and \a cl_arm_integer_dot_product_accumulate_int16.
OpenCL kernel level debugging can be simplified with the use of printf, this requires the \a cl_arm_printf extension to be supported.
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 5cd33b67a6..6f4b717bfa 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -38,10 +38,10 @@ The Core library is a low level collection of algorithms implementations, it is
The Runtime library is a very basic wrapper around the Core library which can be used for quick prototyping, it is basic in the sense that:
- It allocates images and tensors by using standard malloc().
-- It multi-threads Neon code in a very basic way using a very simple pool of threads.
+- It multi-threads Arm® Neon™ code in a very basic way using a very simple pool of threads.
- For OpenCL it uses the default CLScheduler command queue for all mapping operations and kernels.
-For maximum performance, it is expected that the users would re-implement an equivalent to the runtime library which suits better their needs (With a more clever multi-threading strategy, load-balancing between Neon and OpenCL, etc.)
+For maximum performance, it is expected that the users would re-implement an equivalent to the runtime library which suits better their needs (With a more clever multi-threading strategy, load-balancing between Arm® Neon™ and OpenCL, etc.)
@section S4_1_2 Data-type and Data-layout support
@@ -62,7 +62,7 @@ where N = batches, C = channels, H = height, W = width
@section S4_1_3 Fast-math support
Compute Library supports different types of convolution methods, fast-math flag is only used for the Winograd algorithm.
-When the fast-math flag is enabled, both Neon and CL convolution layers will try to dispatch the fastest implementation available, which may introduce a drop in accuracy as well. The different scenarios involving the fast-math flag are presented below:
+When the fast-math flag is enabled, both Arm® Neon™ and CL convolution layers will try to dispatch the fastest implementation available, which may introduce a drop in accuracy as well. The different scenarios involving the fast-math flag are presented below:
- For FP32:
- no-fast-math: Only supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7
- fast-math: Supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7,5x5,7x7
@@ -131,7 +131,7 @@ kernel.run( max_window ); // Run the kernel on the full window
@subsection S4_2_3 Multi-threading
-The previous section shows how to run a Neon / CPP kernel in the current thread, however if your system has several CPU cores, you will probably want the kernel to use several cores. Here is how this can be done:
+The previous section shows how to run a Arm® Neon™ / CPP kernel in the current thread, however if your system has several CPU cores, you will probably want the kernel to use several cores. Here is how this can be done:
@code{.cpp}
ThreadInfo info;
@@ -181,7 +181,7 @@ The previous section shows how to run a Neon / CPP kernel in the current thread,
}
@endcode
-This is a very basic implementation which was originally used in the Neon runtime library by all the Neon functions.
+This is a very basic implementation which was originally used in the Arm® Neon™ runtime library by all the Arm® Neon™ functions.
@sa CPPScheduler
@@ -202,7 +202,7 @@ function.configure( input, output, option0, option1);
function.run();
@endcode
-@warning The Compute Library requires Mali OpenCL DDK r8p0 or higher (OpenCL kernels are compiled using the -cl-arm-non-uniform-work-group-size flag)
+@warning The Compute Library requires Arm® Mali™ OpenCL DDK r8p0 or higher (OpenCL kernels are compiled using the -cl-arm-non-uniform-work-group-size flag)
@note All OpenCL functions and objects in the runtime library use the command queue associated with CLScheduler for all operations, a real implementation would be expected to use different queues for mapping operations and kernels in order to reach a better GPU utilization.
@@ -225,9 +225,9 @@ If the library is compiled with embed_kernels=0 the application can set the path
In order to block until all the jobs in the CLScheduler's command queue are done executing the user can call @ref CLScheduler::sync() or create a sync event using @ref CLScheduler::enqueue_sync_event()
-@subsection S4_4_2_cl_neon OpenCL / Neon interoperability
+@subsection S4_4_2_cl_neon OpenCL / Arm® Neon™ interoperability
-You can mix OpenCL and Neon kernels and functions. However it is the user's responsibility to handle the mapping/unmapping of OpenCL objects.
+You can mix OpenCL and Arm® Neon™ kernels and functions. However it is the user's responsibility to handle the mapping/unmapping of OpenCL objects.
@section S4_5_algorithms Algorithms
@@ -249,7 +249,7 @@ You have 3 types of @ref BorderMode :
- @ref BorderMode::REPLICATE : Neighbor pixels outside of the image are treated as having the same value as the closest valid pixel.
- @ref BorderMode::CONSTANT : Neighbor pixels outside of the image are treated as having the same constant value. (The user can choose what this value should be).
-Moreover both OpenCL and Neon use vector loads and stores instructions to access the data in buffers, so in order to avoid having special cases to handle for the borders all the images and tensors used in this library must be padded.
+Moreover both OpenCL and Arm® Neon™ use vector loads and stores instructions to access the data in buffers, so in order to avoid having special cases to handle for the borders all the images and tensors used in this library must be padded.
@subsubsection padding Padding
@@ -474,7 +474,7 @@ conv2.run();
The implemented @ref TensorAllocator and @ref CLTensorAllocator objects provide an interface capable of importing existing memory to a tensor as backing memory.
-A simple Neon example can be the following:
+A simple Arm® Neon™ example can be the following:
@code{.cpp}
// External backing memory
void* external_ptr = ...;
diff --git a/docs/02_tests.dox b/docs/02_tests.dox
index 0aee8e59d8..70d2f3d67b 100644
--- a/docs/02_tests.dox
+++ b/docs/02_tests.dox
@@ -353,7 +353,7 @@ You can use the `--instruments` option to select one or more instruments to meas
`PMU` will try to read the CPU PMU events from the kernel (They need to be enabled on your platform)
-`MALI` will try to collect Mali hardware performance counters. (You need to have a recent enough Mali driver)
+`MALI` will try to collect Arm® Mali™ hardware performance counters. (You need to have a recent enough Arm® Mali™ driver)
`WALL_CLOCK_TIMER` will measure time using `gettimeofday`: this should work on all platforms.
@@ -371,7 +371,7 @@ To run the OpenCL precommit validation tests:
LD_LIBRARY_PATH=. ./arm_compute_validation --mode=precommit --filter="^CL.*"
-To run the Neon precommit benchmark tests with PMU and Wall Clock timer in miliseconds instruments enabled:
+To run the Arm® Neon™ precommit benchmark tests with PMU and Wall Clock timer in miliseconds instruments enabled:
LD_LIBRARY_PATH=. ./arm_compute_benchmark --mode=precommit --filter="^NEON.*" --instruments="pmu,wall_clock_timer_ms" --iterations=10
diff --git a/docs/04_adding_operator.dox b/docs/04_adding_operator.dox
index 1b4b575964..aef1bb4af0 100644
--- a/docs/04_adding_operator.dox
+++ b/docs/04_adding_operator.dox
@@ -71,7 +71,7 @@ Similarly, all common functions that process shapes, like calculating output sha
@subsection S4_1_2_add_kernel Add a kernel
-As we mentioned at the beginning, the kernel is the implementation of the operator or algorithm partially using a specific programming language related to the backend we want to use. Adding a kernel in the library means implementing the algorithm in a SIMD technology like Neon or OpenCL. All kernels in Compute Library must implement a common interface IKernel or one of the specific subinterfaces.
+As we mentioned at the beginning, the kernel is the implementation of the operator or algorithm partially using a specific programming language related to the backend we want to use. Adding a kernel in the library means implementing the algorithm in a SIMD technology like Arm® Neon™ or OpenCL. All kernels in Compute Library must implement a common interface IKernel or one of the specific subinterfaces.
IKernel is the common interface for all the kernels in the core library, it contains the main methods for configure and run the kernel itself, such as window() that return the maximum window the kernel can be executed on or is_parallelisable() for indicate whether or not the kernel is parallelizable. If the kernel is parallelizable then the window returned by the window() method can be split into sub-windows which can then be run in parallel, in the other case, only the window returned by window() can be passed to the run method.
There are specific interfaces for OpenCL and Neon: @ref ICLKernel, INEKernel (using INEKernel = @ref ICPPKernel).
@@ -120,10 +120,10 @@ For OpenCL:
@snippet src/core/gpu/cl/kernels/ClReshapeKernel.cpp ClReshapeKernel Kernel
The run will call the function defined in the .cl file.
-For the Neon backend case:
+For the Arm® Neon™ backend case:
@snippet src/core/cpu/kernels/CpuReshapeKernel.cpp NEReshapeLayerKernel Kernel
-In the Neon case, there is no need to add an extra file and we implement the kernel in the same NEReshapeLayerKernel.cpp file.
+In the Arm® Neon™ case, there is no need to add an extra file and we implement the kernel in the same NEReshapeLayerKernel.cpp file.
If the tests are already in place, the new kernel can be tested using the existing tests by adding the configure and run of the kernel to the compute_target() in the fixture.
@@ -137,13 +137,13 @@ If the tests are already in place, the new kernel can be tested using the existi
- (sub[n].start() - max[n].start()) % max[n].step() == 0
- (sub[n].end() - sub[n].start()) % max[n].step() == 0
-@ref CPPScheduler::schedule provides a sample implementation that is used for Neon kernels.
-%Memory management is the other aspect that the runtime layer is supposed to handle. %Memory management of the tensors is abstracted using TensorAllocator. Each tensor holds a pointer to a TensorAllocator object, which is used to allocate and free the memory at runtime. The implementation that is currently supported in Compute Library allows memory blocks, required to be fulfilled for a given operator, to be grouped together under a @ref MemoryGroup. Each group can be acquired and released. The underlying implementation of memory groups vary depending on whether Neon or CL is used. The memory group class uses memory pool to provide the required memory. It also uses the memory manager to manage the lifetime and a IPoolManager to manage the memory pools registered with the memory manager.
+@ref CPPScheduler::schedule provides a sample implementation that is used for Arm® Neon™ kernels.
+%Memory management is the other aspect that the runtime layer is supposed to handle. %Memory management of the tensors is abstracted using TensorAllocator. Each tensor holds a pointer to a TensorAllocator object, which is used to allocate and free the memory at runtime. The implementation that is currently supported in Compute Library allows memory blocks, required to be fulfilled for a given operator, to be grouped together under a @ref MemoryGroup. Each group can be acquired and released. The underlying implementation of memory groups vary depending on whether Arm® Neon™ or CL is used. The memory group class uses memory pool to provide the required memory. It also uses the memory manager to manage the lifetime and a IPoolManager to manage the memory pools registered with the memory manager.
We have seen the various interfaces for a kernel in the core library, the same structure the same file structure design exists in the runtime module. IFunction is the base class for all the functions, it has two child interfaces: ICLSimpleFunction and INESimpleFunction that are used as base class for functions which call a single kernel.
-The new operator has to implement %validate(), configure() and run(), these methods will call the respective function in the kernel considering that the multi-threading is used for the kernels which are parallelizable, by default std::thread::hardware_concurrency() threads are used. For Neon function can be used CPPScheduler::set_num_threads() to manually set the number of threads, whereas for OpenCL kernels all the kernels are enqueued on the queue associated with CLScheduler and the queue is then flushed.
+The new operator has to implement %validate(), configure() and run(), these methods will call the respective function in the kernel considering that the multi-threading is used for the kernels which are parallelizable, by default std::thread::hardware_concurrency() threads are used. For Arm® Neon™ function can be used CPPScheduler::set_num_threads() to manually set the number of threads, whereas for OpenCL kernels all the kernels are enqueued on the queue associated with CLScheduler and the queue is then flushed.
For the runtime functions, there is an extra method implemented: prepare(), this method prepares the function for the run, it does all the heavy operations that are done only once (reshape the weight, release the memory not necessary after the reshape, etc). The prepare method can be called standalone or in the first run, if not called before, after then the function will be marked as prepared.
The files we add are:
diff --git a/docs/06_functions_list.dox b/docs/06_functions_list.dox
index 0c5145cdc8..2cd16d0603 100644
--- a/docs/06_functions_list.dox
+++ b/docs/06_functions_list.dox
@@ -29,7 +29,7 @@ namespace arm_compute
@tableofcontents
-@section S6_1 Neon functions
+@section S6_1 Arm® Neon™ functions
- @ref IFunction
- @ref INESimpleFunction
diff --git a/docs/07_errata.dox b/docs/07_errata.dox
index 6a82ca91c4..0c8d684017 100644
--- a/docs/07_errata.dox
+++ b/docs/07_errata.dox
@@ -53,10 +53,10 @@ namespace arm_compute
- Versions Affected: >= v19.11
- OSs Affected: Linux
- Conditions:
- - Mali DDK r1p0 - r8p0, and
+ - Arm® Mali™ DDK r1p0 - r8p0, and
- Linux kernel >= 4.4
-- On Android with arm64-v8a/arm64-v8.2-a architecture, Neon validation tests can fail when compiled using Android Ndk
+- On Android with arm64-v8a/arm64-v8.2-a architecture, Arm® Neon™ validation tests can fail when compiled using Android Ndk
>= r18b in debug mode (https://github.com/android/ndk/issues/1135).
- Versions Affected: >= v19.11
- OSs Affected: Android
diff --git a/docs/ComputeLibrary.dir b/docs/ComputeLibrary.dir
index de4968c0ab..74ac9d9d23 100644
--- a/docs/ComputeLibrary.dir
+++ b/docs/ComputeLibrary.dir
@@ -44,15 +44,15 @@
*/
/** @dir src/core/NEON
- * @brief Neon backend core: kernels and utilities.
+ * @brief Arm® Neon™ backend core: kernels and utilities.
*/
/** @file src/core/NEON/NEKernels.h
- * @brief Includes all the Neon kernels at once
+ * @brief Includes all the Arm® Neon™ kernels at once
*/
/** @dir src/core/NEON/kernels
- * @brief Folder containing all the Neon kernels
+ * @brief Folder containing all the Arm® Neon™ kernels
*/
/** @dir arm_compute/core/utils
@@ -76,7 +76,7 @@
*/
/** @dir arm_compute/graph/backends/NEON
- * @brief Neon specific operations
+ * @brief Arm® Neon™ specific operations
*/
/** @dir arm_compute/graph/detail
@@ -148,15 +148,15 @@
*/
/** @dir arm_compute/runtime/NEON
- * @brief Neon backend runtime interface.
+ * @brief Arm® Neon™ backend runtime interface.
*/
/** @file arm_compute/runtime/NEON/NEFunctions.h
- * @brief Includes all the Neon functions at once.
+ * @brief Includes all the Arm® Neon™ functions at once.
*/
/** @dir arm_compute/runtime/NEON/functions
- * @brief Folder containing all the Neon functions.
+ * @brief Folder containing all the Arm® Neon™ functions.
*/
/** @dir arm_compute/runtime/OMP
@@ -182,8 +182,8 @@
*
* -# cl_*.cpp --> OpenCL examples
* -# graph_*.cpp --> Graph examples
- * -# neoncl_*.cpp --> Neon / OpenCL interoperability examples
- * -# neon_*.cpp --> Neon examples
+ * -# neoncl_*.cpp --> Arm® Neon™ / OpenCL interoperability examples
+ * -# neon_*.cpp --> Arm® Neon™ examples
*/
/** @dir examples/gemm_tuner
@@ -211,11 +211,11 @@
*/
/** @dir src/core/NEON/wrapper
- * @brief Neon wrapper used to simplify code
+ * @brief Arm® Neon™ wrapper used to simplify code
*/
/** @file src/core/NEON/wrapper/traits.h
- * @brief Traits defined on Neon vectors
+ * @brief Traits defined on Arm® Neon™ vectors
*/
/** @file src/core/NEON/wrapper/wrapper.h
@@ -223,7 +223,7 @@
*/
/** @dir src/core/NEON/wrapper/intrinsics
- * @brief Neon intrinsics wrappers
+ * @brief Arm® Neon™ intrinsics wrappers
*/
/** @dir src/core/NEON/wrapper/scalar
@@ -255,7 +255,7 @@
*/
/** @dir tests/NEON
- * @brief Neon accessors.
+ * @brief Arm® Neon™ accessors.
*/
/** @dir tests/benchmark
@@ -267,7 +267,7 @@
*/
/** @dir tests/benchmark/NEON
- * @brief Neon benchmarking tests.
+ * @brief Arm® Neon™ benchmarking tests.
*/
/** @dir tests/benchmark_examples
@@ -299,7 +299,7 @@
*/
/** @dir tests/validation/NEON
- * @brief Neon validation tests.
+ * @brief Arm® Neon™ validation tests.
*/
/** @dir tests/validation/reference