From d813bab10bb4fe954fa0e962e1402ed1377617da Mon Sep 17 00:00:00 2001 From: Sheri Zhang Date: Fri, 30 Apr 2021 16:53:41 +0100 Subject: Restructure documentation The documentation has been restructured for better grouping and readability. Resolves: COMPMID-4198 Signed-off-by: Sheri Zhang Change-Id: I8c8bc77f0aab8d63f1659f2235dbab634422a68c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5568 Tested-by: Georgios Pinitas Comments-Addressed: Arm Jenkins --- docs/user_guide/api.dox | 135 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) create mode 100644 docs/user_guide/api.dox (limited to 'docs/user_guide/api.dox') diff --git a/docs/user_guide/api.dox b/docs/user_guide/api.dox new file mode 100644 index 0000000000..39282046a9 --- /dev/null +++ b/docs/user_guide/api.dox @@ -0,0 +1,135 @@ +/// +/// Copyright (c) 2021 Arm Limited. +/// +/// SPDX-License-Identifier: MIT +/// +/// Permission is hereby granted, free of charge, to any person obtaining a copy +/// of this software and associated documentation files (the "Software"), to +/// deal in the Software without restriction, including without limitation the +/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or +/// sell copies of the Software, and to permit persons to whom the Software is +/// furnished to do so, subject to the following conditions: +/// +/// The above copyright notice and this permission notice shall be included in all +/// copies or substantial portions of the Software. +/// +/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +/// SOFTWARE. +/// +namespace arm_compute +{ +/** +@page api Application Programming Interface + +@tableofcontents + +@section api_overview Overview + +In this section we present Compute Library's application programming interface (API) architecture along with +a detailed explanation of its components. Compute Library's API consists of multiple high-level operators and +even more internally distinct computational blocks that can be executed on a command queue. +Operators can be bound to multiple Tensor objects and executed concurrently or asynchronously if needed. +All operators and associated objects are encapsulated in a Context-based mechanism, which provides all related +construction services. + +@section api_objects Fundamental objects + +Compute Library consists of a list of fundamental objects that are responsible for creating and orchestrating operator execution. +Below we present these objects in more detail. + +@subsection api_objects_context AclContext or Context + +AclContext or Context acts as a central creational aggregate service. All other objects are bound to or created from a context. +It provides, internally, common facilities such as +- allocators for object creation or backing memory allocation +- serialization interfaces +- any other modules that affect the construction of objects (e.g., program cache for OpenCL). + +The followings sections will describe parameters that can be given on the creation of Context. + +@subsubsection api_object_context_target AclTarget +Context is initialized with a backend target (AclTarget) as different backends might have a different subset of services. +Currently the following targets are supported: +- #AclCpu: a generic CPU target that accelerates primitives through SIMD technologies +- #AclGpuOcl: a target for GPU acceleration using OpenCL + +@subsubsection api_object_context_execution_mode AclExecutionMode +An execution mode (AclExecutionMode) can be passed as an argument that affects the operator creation. +At the moment the following execution modes are supported: +- #AclPreferFastRerun: Provides faster re-run. It can be used when the operators are expected to be executed multiple +times under the same execution context +- #AclPreferFastStart: Provides faster single execution. It can be used when the operators will be executed only once, +thus reducing their latency is important (Currently, it is not implemented) + +@subsubsection api_object_context_capabilitys AclTargetCapabilities +Context creation can also have a list of capabilities of hardware as one of its parameters. This is currently +available only for the CPU backend. A list of architecture capabilities can be passed to influence the selection +of the underlying kernels. Such capabilities can be for example the enablement of SVE or the dot product +instruction explicitly. +@note The underlying hardware should support the given capability list. + +@subsubsection api_object_context_allocator Allocator +An allocator object that implements @ref AclAllocator can be passed to the Context upon its creation. +This user-provided allocator will be used for allocation of any internal backing memory. + +@note To enable interoperability with OpenCL, additional entrypoints are provided +to extract (@ref AclGetClContext) or set (@ref AclSetClContext) the internal OpenCL context. + +@subsection api_objects_tensor AclTensor or Tensor + +A tensor is a mathematical object that can describe physical properties like matrices. +It can be also considered a generalization of matrices that can represent arbitrary +dimensionalities. AclTensor is an abstracted interface that represents a tensor. + +AclTensor, in addition to the elements of the physical properties they represent, +also contains the information such as shape, data type, data layout and strides to not only +fully describe the characteristics of the physical properties but also provide information +how the object stored in memory should be traversed. @ref AclTensorDescriptor is a dedicated +object to represent such metadata. + +@note The allocation of an AclTensor can be deferred until external memory is imported +as backing memory to accomplish a zero-copy context. + +@note To enable interoperability with OpenCL, additional entrypoints are provided +to extract (@ref AclGetClMem) the internal OpenCL memory object. + +As Tensors can reside in different memory spaces, @ref AclMapTensor and @ref AclUnmapTensor entrypoints +are provided to map Tensors in and out of the host memory system, respectively. + +@subsection api_objects_queue AclQueue or Queue + +AclQueue acts as a runtime aggregate service. It provides facilities to schedule +and execute operators using underlying hardware. It also contains services like +tuning mechanisms (e.g., Local workgroup size tuning for OpenCL) that can be specified +during operator execution. + +@note To enable interoperability with OpenCL, additional entrypoints are provided +to extract (@ref AclGetClQueue) or set (@ref AclSetClQueue) the internal OpenCL queue. + +@section api_internal Internal +@subsection api_internal_operator_vs_kernels Operators vs Kernels + +Internally, Compute Library separates the executable primitives in two categories: kernels and operators +which operate in a hierarchical way. + +A kernel is the lowest-level computation block whose responsibility is performing a task on a given group of data. +For design simplicity, kernels computation does NOT involve the following: + +- Memory allocation: All the memory manipulation should be handled by the caller. +- Multi-threading: The information on how the workload can be split is provided by kernels, +so the caller can effectively distribute the workload to multiple threads. + +On the other hand, operators combine one or multiple kernels to achieve more complex calculations. +The responsibilities of the operators can be summarized as follows: + +- Defining the scheduling policy and dispatching of the underlying kernels to the hardware backend +- Providing information to the caller required by the computation (e.g., memory requirements) +- Allocation of any required auxiliary memory if it isn't given by its caller explicitly + +*/ +} // namespace arm_compute -- cgit v1.2.1