API.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

# Vela External APIs

Vela provides a low-level external API to enable Ethos-U code generation from
other tools.

The external APIs facilitate other tools that require backend compiler
functionality. From herein this functionality is referred to as "the compiler".
The compiler takes as input a network model, and uses the APIs to convert the
model to instructions that can be run on an Ethos-U NPU.

This document contains an overview of the available APIs and the steps that are
needed to use them.

## Overview

All data types and functions to facilitate code generation are located in module
`ethosu.vela.api`. All API function prototypes are fully documented in the
module using docstrings.

### Data types

Class `NpuOperation` is the base class for all operations. It contains a low
level abstraction of an operation that can be run on an Ethos-U NPU. It has the
following sub-classes:

* `NpuDmaOperation`, to perform memory to memory DMA operations, e.g. for moving
  a chunk of memory from DRAM to SRAM
* `NpuConv2DOperation`, for convolution operations like 2-D convolutions,
  transpose convolutions, and also for fully connected operations
* `NpuConvDepthWiseOperation`, for depthwise convolutions
* `NpuPoolingOperation`, for max pooling/average pooling operations
* `NpuElementWiseOperation`, for unary and binary elementwise operations like
  add, subtract, abs, etc.

Class `NpuActivation` is used to represent activation functions which are fused
with the NPU operation, for instance relu or sigmoid.

It is up to the compiler to convert operations of the input model to a list of
these basic NPU operations. Note that the compiler is responsible for all
address planning, i.e. it needs to supply addresses of all input and output
tensors, weights, and biases.

### Finding block configs

For all NPU operations, a block config must be set, which is the unit of work in
which the NPU generates the output. There are restrictions to the size of block
configs. Function `npu_find_block_configs` can be used to find valid block
configs for an operation.

### Encoding of weights and biases

All weights that are used in the NPU operations must be encoded using
function `npu_encode_weights`, and all biases using function `npu_encode_bias`.

### Generating a register command stream

The instructions that are executed by Ethos-U NPUs are called *register
commands*. When the compiler has compressed all weights and biases, converted
all network operations to NPU operations, and allocated all addresses, the
register command stream can be generated using function
`register_command_stream_generator`. This returns a list of 32-bit integers.

In addition to transforming NPU operations to register commands, Vela also:

* selects a suitable block configuration for each instruction (optional)
* adds kernel/DMA wait commands if necessary
* selects the most efficient "block dependency" that controls the NPU pipeline.

### Creating a Driver Payload for the Ethos-U driver

If an Ethos-U driver is used to trigger the execution of the register command
stream, a Driver Payload byte-array must be provided to the driver that
contains:

* a header with driver actions
* the register command stream

This byte array can be generated using function `npu_create_driver_payload`.

### API version

Function `npu_get_api_version` returns the version of the Vela External APIs,
which is maintained separately from Vela's overall version.

## Unit tests

For examples of how to use these APIs, please see the unit tests that are
bundled with Vela's source code, in module `ethosu.vela.test.extapi`.