diff options
-rw-r--r-- | docs/08_api.dox | 50 |
1 files changed, 46 insertions, 4 deletions
diff --git a/docs/08_api.dox b/docs/08_api.dox index a73b1bd351..29d31c831d 100644 --- a/docs/08_api.dox +++ b/docs/08_api.dox @@ -42,7 +42,7 @@ construction services. Compute Library consists of a list of fundamental objects that are responsible for creating and orchestrating operator execution. Below we present these objects in more detail. -@subsection api_objects_context @ref AclContext or @ref Context +@subsection api_objects_context AclContext or Context AclContext or Context acts as a central creational aggregate service. All other objects are bound to or created from a context. It provides, internally, common facilities such as @@ -52,13 +52,13 @@ It provides, internally, common facilities such as The followings sections will describe parameters that can be given on the creation of Context. -@subsubsection api_object_context_target @ref AclTarget +@subsubsection api_object_context_target AclTarget Context is initialized with a backend target (AclTarget) as different backends might have a different subset of services. Currently the following targets are supported: - #AclCpu: a generic CPU target that accelerates primitives through SIMD technologies - #AclGpuOcl: a target for GPU acceleration using OpenCL -@subsubsection api_object_context_execution_mode @ref AclExecutionMode +@subsubsection api_object_context_execution_mode AclExecutionMode An execution mode (AclExecutionMode) can be passed as an argument that affects the operator creation. At the moment the following execution modes are supported: - #AclPreferFastRerun: Provides faster re-run. It can be used when the operators are expected to be executed multiple @@ -66,7 +66,7 @@ times under the same execution context - #AclPreferFastStart: Provides faster single execution. It can be used when the operators will be executed only once, thus reducing their latency is important (Currently, it is not implemented) -@subsubsection api_object_context_capabilitys @ref AclTargetCapabilities +@subsubsection api_object_context_capabilitys AclTargetCapabilities Context creation can also have a list of capabilities of hardware as one of its parameters. This is currently available only for the CPU backend. A list of architecture capabilities can be passed to influence the selection of the underlying kernels. Such capabilities can be for example the enablement of SVE or the dot product @@ -79,5 +79,47 @@ This user-provided allocator will be used for allocation of any internal backing @note To enable interoperability with OpenCL, additional entrypoints are provided to extract (@ref AclGetClContext) or set (@ref AclSetClContext) the internal OpenCL context. + +@subsection api_objects_tensor AclTensor or Tensor + +A tensor is a mathematical object that can describe physical properties like matrices. +It can be also considered a generalization of matrices that can represent arbitrary +dimensionalities. AclTensor is an abstracted interface that represents a tensor. + +AclTensor, in addition to the elements of the physical properties they represent, +also contains the information such as shape, data type, data layout and strides to not only +fully describe the characteristics of the physical properties but also provide information +how the object stored in memory should be traversed. @ref AclTensorDescriptor is a dedicated +object to represent such metadata. + +@note The allocation of an AclTensor can be deferred until external memory is imported +as backing memory to accomplish a zero-copy context. + +@note To enable interoperability with OpenCL, additional entrypoints are provided +to extract (@ref AclGetClMem) the internal OpenCL memory object. + +As Tensors can reside in different memory spaces, @ref AclMapTensor and @ref AclUnmapTensor entrypoints +are provided to map Tensors in and out of the host memory system, respectively. + +@section api_internal Internal +@subsection api_internal_operator_vs_kernels Operators vs Kernels + +Internally, Compute Library separates the executable primitives in two categories: kernels and operators +which operate in a hierarchical way. + +A kernel is the lowest-level computation block whose responsibility is performing a task on a given group of data. +For design simplicity, kernels computation does NOT involve the following: + +- Memory allocation: All the memory manipulation should be handled by the caller. +- Multi-threading: The information on how the workload can be split is provided by kernels, +so the caller can effectively distribute the workload to multiple threads. + +On the other hand, operators combine one or multiple kernels to achieve more complex calculations. +The responsibilities of the operators can be summarized as follows: + +- Defining the scheduling policy and dispatching of the underlying kernels to the hardware backend +- Providing information to the caller required by the computation (e.g., memory requirements) +- Allocation of any required auxiliary memory if it isn't given by its caller explicitly + */ } // namespace arm_compute |