aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 03ad7feca8572056e3e23be9cb4ddbf2b816c0c0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# Vela
This tool is used to compile a [TensorFlow Lite for Microcontrollers](https://www.tensorflow.org/lite/microcontrollers) neural network model into an optimised version that can run on an embedded system containing an [Ethos-U55 NPU](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u55).

The optimised model will contain TensorFlow Lite Custom operators for those parts of the model that can be accelerated by the Ethos-U55. Parts of the model that cannot be accelerated are left unchanged and will instead run on the Cortex-M series CPU using an appropriate kernel (such as the [Arm](https://www.arm.com) optimised [CMSIS-NN](https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/NN) kernels).

After compilation the optimised model can only be run on an Ethos-U55 NPU embedded system.

The tool will also generate performance estimates (EXPERIMENTAL) for the compiled model.

## Environment
Vela runs on the Linux operating system.

## Prerequisites
The following should be installed prior to the installation of Vela:
 - Python >= 3.6
 - GNU toolchain (GCC, Binutils and libraries) or alternative C compiler/linker toolchain

## Installation
Before running, the Vela package must be installed along with all its dependencies. To do this, first change to the directory that contains this README.md file. Then use the command:
```
pip3 install -U setuptools>=40.1.0
pip3 install .
```

Or, if you use the `pipenv` virtual environment tool:
```
pipenv install .
```

## Running
Vela is run with an input `.tflite` file passed on the command line. This file contains the neural network to be compiled. The tool then outputs an optimised version with a `_vela.tflite` file prefix, along with the performance estimate (EXPERIMENTAL) CSV files, all to the output directory.

If you use the `pipenv` virtual environment tool then first start by spawning a shell in the virtual environment.:
```
pipenv shell
```
After which running Vela is the same regardless of whether you are in a virtual environment or not.

Example usage:
1) Compile the network `my_model.tflite`. The optimised version will be output to `./output/my_network_vela.tflite`.
```
vela my_model.tflite
```
2) Compile the network `/path/to/my_model.tflite` and specify the output to go in the directory `./results_dir/`.
```
vela --output-dir ./results_dir /path/to/my_model.tflite
```
3) To get a list of all available options:
```
vela --help
```
4) To specifiy information about the embedded system's configuration use Vela's system configuration file. The following command selects the `MySysConfig` settings that are described in the `sys_cfg_vela.ini` system configuration file. More details can be found in the next section.
```
vela --config sys_cfg_vela.ini --system-config MySysConfig my_model.tflite
```

### Vela's System Configuration file
This is used to describe various properties of the embedded system that the network will run in.

Example of a Vela system configuration file.
```
; File: sys_cfg_vela.ini
; The file contains two parts; a system config part and a CPU operator
; performance part.

; System config
; Specifies properties such as the core clock speed, the size and speed of the
; four potential memory areas, and for various types of data which memory area
; is used to store them. The cpu property is used to link with the CPU operator
; performance.
; The four potential memory areas are: Sram, Dram, OnChipFlash, OffChipFlash.

[SysConfig.MySysConfig]
npu_freq=500e6
cpu=MyCpu
Sram_clock_scale=1
Sram_port_width=64
Dram_clock_scale=1
Dram_port_width=64
OnChipFlash_clock_scale=1
OnChipFlash_port_width=64
OffChipFlash_clock_scale=0.25
OffChipFlash_port_width=32
permanent_storage_mem_area=OffChipFlash
feature_map_storage_mem_area=Sram
fast_storage_mem_area=Sram

; CPU operator performance
; Specifies properties that are used by a linear model to estimate the
; performance for any operations that will be run on the CPU (such as those not
; supported by the NPU). Setting the intercept and slope to 0 will result in
; the operator being excluded from the performance estimation. This is the same
; as not specifying the operator. If an explicit cpu is specified rather than
; using the default then the cpu name must match the cpu specified in the
; SysConfig.<system config name> section.

[CpuPerformance.MyCpuOperator]
default.intercept=0.0
default.slope=1.0

MyCpu.intercept=0.0
MyCpu.slope=1.0
```

## Contribution Guidlines and Pull Requests
Contributions are accepted under [Apache License 2.0](LICENSE.txt). Only submit contributions where you have authored all of the code.

## Resources
* [Ethos-U55](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u55)

## License
Vela is licensed under [Apache License 2.0](LICENSE.txt)