aboutsummaryrefslogtreecommitdiff
path: root/docs/user_guide/api.dox
blob: 39282046a9012f47df938b41aaabfa6f769fd95a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
///
/// Copyright (c) 2021 Arm Limited.
///
/// SPDX-License-Identifier: MIT
///
/// Permission is hereby granted, free of charge, to any person obtaining a copy
/// of this software and associated documentation files (the "Software"), to
/// deal in the Software without restriction, including without limitation the
/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
/// sell copies of the Software, and to permit persons to whom the Software is
/// furnished to do so, subject to the following conditions:
///
/// The above copyright notice and this permission notice shall be included in all
/// copies or substantial portions of the Software.
///
/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
/// SOFTWARE.
///
namespace arm_compute
{
/**
@page api Application Programming Interface

@tableofcontents

@section api_overview Overview

In this section we present Compute Library's application programming interface (API) architecture along with
a detailed explanation of its components. Compute Library's API consists of multiple high-level operators and
even more internally distinct computational blocks that can be executed on a command queue.
Operators can be bound to multiple Tensor objects and executed concurrently or asynchronously if needed.
All operators and associated objects are encapsulated in a Context-based mechanism, which provides all related
construction services.

@section api_objects Fundamental objects

Compute Library consists of a list of fundamental objects that are responsible for creating and orchestrating operator execution.
Below we present these objects in more detail.

@subsection api_objects_context AclContext or Context

AclContext or Context acts as a central creational aggregate service. All other objects are bound to or created from a context.
It provides, internally, common facilities such as
- allocators for object creation or backing memory allocation
- serialization interfaces
- any other modules that affect the construction of objects (e.g., program cache for OpenCL).

The followings sections will describe parameters that can be given on the creation of Context.

@subsubsection api_object_context_target AclTarget
Context is initialized with a backend target (AclTarget) as different backends might have a different subset of services.
Currently the following targets are supported:
- #AclCpu: a generic CPU target that accelerates primitives through SIMD technologies
- #AclGpuOcl: a target for GPU acceleration using OpenCL

@subsubsection api_object_context_execution_mode AclExecutionMode
An execution mode (AclExecutionMode) can be passed as an argument that affects the operator creation.
At the moment the following execution modes are supported:
- #AclPreferFastRerun: Provides faster re-run. It can be used when the operators are expected to be executed multiple
times under the same execution context
- #AclPreferFastStart: Provides faster single execution. It can be used when the operators will be executed only once,
thus reducing their latency is important (Currently, it is not implemented)

@subsubsection api_object_context_capabilitys AclTargetCapabilities
Context creation can also have a list of capabilities of hardware as one of its parameters. This is currently
available only for the CPU backend. A list of architecture capabilities can be passed to influence the selection
of the underlying kernels. Such capabilities can be for example the enablement of SVE or the dot product
instruction explicitly.
@note The underlying hardware should support the given capability list.

@subsubsection api_object_context_allocator Allocator
An allocator object that implements @ref AclAllocator can be passed to the Context upon its creation.
This user-provided allocator will be used for allocation of any internal backing memory.

@note To enable interoperability with OpenCL, additional entrypoints are provided
to extract (@ref AclGetClContext) or set (@ref AclSetClContext) the internal OpenCL context.

@subsection api_objects_tensor AclTensor or Tensor

A tensor is a mathematical object that can describe physical properties like matrices.
It can be also considered a generalization of matrices that can represent arbitrary
dimensionalities. AclTensor is an abstracted interface that represents a tensor.

AclTensor, in addition to the elements of the physical properties they represent,
also contains the information such as shape, data type, data layout and strides to not only
fully describe the characteristics of the physical properties but also provide information
how the object stored in memory should be traversed. @ref AclTensorDescriptor is a dedicated
object to represent such metadata.

@note The allocation of an AclTensor can be deferred until external memory is imported
as backing memory to accomplish a zero-copy context.

@note To enable interoperability with OpenCL, additional entrypoints are provided
to extract (@ref AclGetClMem) the internal OpenCL memory object.

As Tensors can reside in different memory spaces, @ref AclMapTensor and @ref AclUnmapTensor entrypoints
are provided to map Tensors in and out of the host memory system, respectively.

@subsection api_objects_queue AclQueue or Queue

AclQueue acts as a runtime aggregate service. It provides facilities to schedule
and execute operators using underlying hardware. It also contains services like
tuning mechanisms (e.g., Local workgroup size tuning for OpenCL) that can be specified
during operator execution.

@note To enable interoperability with OpenCL, additional entrypoints are provided
to extract (@ref AclGetClQueue) or set (@ref AclSetClQueue) the internal OpenCL queue.

@section api_internal Internal
@subsection api_internal_operator_vs_kernels Operators vs Kernels

Internally, Compute Library separates the executable primitives in two categories: kernels and operators
which operate in a hierarchical way.

A kernel is the lowest-level computation block whose responsibility is performing a task on a given group of data.
For design simplicity, kernels computation does NOT involve the following:

- Memory allocation: All the memory manipulation should be handled by the caller.
- Multi-threading: The information on how the workload can be split is provided by kernels,
so the caller can effectively distribute the workload to multiple threads.

On the other hand, operators combine one or multiple kernels to achieve more complex calculations.
The responsibilities of the operators can be summarized as follows:

- Defining the scheduling policy and dispatching of the underlying kernels to the hardware backend
- Providing information to the caller required by the computation (e.g., memory requirements)
- Allocation of any required auxiliary memory if it isn't given by its caller explicitly

*/
} // namespace arm_compute