From 816e48390c65b5487c1e2525b930b935ca5f4293 Mon Sep 17 00:00:00 2001 From: Sang-Hoon Park Date: Wed, 21 Apr 2021 14:26:49 +0100 Subject: Update API documentation API documentation is updated to have description for - Tensor - Internal architecture regarding operators and kernels - Supported data type and layout Partially Resolves: COMPMID-4200 Change-Id: I17011be2890c724014acd3543d688eb5124ff944 Signed-off-by: Sang-Hoon Park Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5501 Comments-Addressed: Arm Jenkins Reviewed-by: Pablo Marquez Tello Tested-by: Arm Jenkins --- docs/08_api.dox | 50 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/docs/08_api.dox b/docs/08_api.dox index a73b1bd351..29d31c831d 100644 --- a/docs/08_api.dox +++ b/docs/08_api.dox @@ -42,7 +42,7 @@ construction services. Compute Library consists of a list of fundamental objects that are responsible for creating and orchestrating operator execution. Below we present these objects in more detail. -@subsection api_objects_context @ref AclContext or @ref Context +@subsection api_objects_context AclContext or Context AclContext or Context acts as a central creational aggregate service. All other objects are bound to or created from a context. It provides, internally, common facilities such as @@ -52,13 +52,13 @@ It provides, internally, common facilities such as The followings sections will describe parameters that can be given on the creation of Context. -@subsubsection api_object_context_target @ref AclTarget +@subsubsection api_object_context_target AclTarget Context is initialized with a backend target (AclTarget) as different backends might have a different subset of services. Currently the following targets are supported: - #AclCpu: a generic CPU target that accelerates primitives through SIMD technologies - #AclGpuOcl: a target for GPU acceleration using OpenCL -@subsubsection api_object_context_execution_mode @ref AclExecutionMode +@subsubsection api_object_context_execution_mode AclExecutionMode An execution mode (AclExecutionMode) can be passed as an argument that affects the operator creation. At the moment the following execution modes are supported: - #AclPreferFastRerun: Provides faster re-run. It can be used when the operators are expected to be executed multiple @@ -66,7 +66,7 @@ times under the same execution context - #AclPreferFastStart: Provides faster single execution. It can be used when the operators will be executed only once, thus reducing their latency is important (Currently, it is not implemented) -@subsubsection api_object_context_capabilitys @ref AclTargetCapabilities +@subsubsection api_object_context_capabilitys AclTargetCapabilities Context creation can also have a list of capabilities of hardware as one of its parameters. This is currently available only for the CPU backend. A list of architecture capabilities can be passed to influence the selection of the underlying kernels. Such capabilities can be for example the enablement of SVE or the dot product @@ -79,5 +79,47 @@ This user-provided allocator will be used for allocation of any internal backing @note To enable interoperability with OpenCL, additional entrypoints are provided to extract (@ref AclGetClContext) or set (@ref AclSetClContext) the internal OpenCL context. + +@subsection api_objects_tensor AclTensor or Tensor + +A tensor is a mathematical object that can describe physical properties like matrices. +It can be also considered a generalization of matrices that can represent arbitrary +dimensionalities. AclTensor is an abstracted interface that represents a tensor. + +AclTensor, in addition to the elements of the physical properties they represent, +also contains the information such as shape, data type, data layout and strides to not only +fully describe the characteristics of the physical properties but also provide information +how the object stored in memory should be traversed. @ref AclTensorDescriptor is a dedicated +object to represent such metadata. + +@note The allocation of an AclTensor can be deferred until external memory is imported +as backing memory to accomplish a zero-copy context. + +@note To enable interoperability with OpenCL, additional entrypoints are provided +to extract (@ref AclGetClMem) the internal OpenCL memory object. + +As Tensors can reside in different memory spaces, @ref AclMapTensor and @ref AclUnmapTensor entrypoints +are provided to map Tensors in and out of the host memory system, respectively. + +@section api_internal Internal +@subsection api_internal_operator_vs_kernels Operators vs Kernels + +Internally, Compute Library separates the executable primitives in two categories: kernels and operators +which operate in a hierarchical way. + +A kernel is the lowest-level computation block whose responsibility is performing a task on a given group of data. +For design simplicity, kernels computation does NOT involve the following: + +- Memory allocation: All the memory manipulation should be handled by the caller. +- Multi-threading: The information on how the workload can be split is provided by kernels, +so the caller can effectively distribute the workload to multiple threads. + +On the other hand, operators combine one or multiple kernels to achieve more complex calculations. +The responsibilities of the operators can be summarized as follows: + +- Defining the scheduling policy and dispatching of the underlying kernels to the hardware backend +- Providing information to the caller required by the computation (e.g., memory requirements) +- Allocation of any required auxiliary memory if it isn't given by its caller explicitly + */ } // namespace arm_compute -- cgit v1.2.1