docs/sections/building.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023

# Building the Code Samples application from sources

## Contents

- [Building the Code Samples application from sources](#building-the-code-samples-application-from-sources)
  - [Contents](#contents)
  - [Build prerequisites](#build-prerequisites)
  - [Build options](#build-options)
  - [Build process](#build-process)
    - [Preparing build environment](#preparing-build-environment)
    - [Create a build directory](#create-a-build-directory)
    - [Configuring the build for `MPS3: SSE-300`](#configuring-the-build-for-mps3-sse-300)
    - [Configuring the build for `MPS3: SSE-200`](#configuring-the-build-for-mps3-sse-200)
    - [Configuring the build native unit-test](#configuring-the-build-native-unit-test)
    - [Configuring the build for `simple_platform`](#configuring-the-build-for-simple_platform)
    - [Building the configured project](#building-the-configured-project)
  - [Building timing adapter with custom options](#building-timing-adapter-with-custom-options)
  - [Add custom inputs](#add-custom-inputs)
  - [Add custom model](#add-custom-model)
  - [Optimize custom model with Vela compiler](#optimize-custom-model-with-vela-compiler)
  - [Memory constraints](#memory-constraints)
  - [Automatic file generation](#automatic-file-generation)

This section assumes the use of an **x86 Linux** build machine.

## Build prerequisites

Before proceeding, please, make sure that the following prerequisites
are fulfilled:

- Arm Compiler version 6.14 or above is installed and available on the
    path.

    Test the compiler by running:

    ```commandline
    armclang -v
    ```

    ```log
    Product: ARM Compiler 6.14 Professional
    Component: ARM Compiler 6.14
    ```

    > **Note:** Add compiler to the path, if needed:
    >
    > `export PATH=/path/to/armclang/bin:$PATH`

- Compiler license is configured correctly

- CMake version 3.15 or above is installed and available on the path.
    Test CMake by running:

    ```commandline
    cmake --version
    ```

    ```log
    cmake version 3.16.2
    ```

    > **Note:** Add cmake to the path, if needed:
    >
    > `export PATH=/path/to/cmake/bin:$PATH`

- Python 3.6 or above is installed. Test python version by running:

    ```commandline
    python3 --version
    ```

    ```log
    Python 3.6.8
    ```

- Build system will create python virtual environment during the build
    process. Please make sure that python virtual environment module is
    installed:

    ```commandline
    python3 -m venv
    ```

- Make or MinGW make For Windows

    ```commandline
    make --version
    ```

    ```log
    GNU Make 4.1

    ...
    ```

    > **Note:** Add it to the path environment variable, if needed.

- Access to the Internet to download the third party dependencies, specifically: TensorFlow Lite Micro, Arm Ethos-U55
driver and CMSIS. Instructions for downloading these are listed under [preparing build environment](#preparing-build-environment).

## Build options

The project build system allows user to specify custom NN
model (in `.tflite` format) or images and compile application binary from
sources.

The build system uses pre-built TensorFlow Lite for Microcontrollers
library and Arm® Ethos™-U55 driver libraries from the delivery package.

The build script is parameterized to support different options. Default
values for build parameters will build the executable compatible with
the Ethos-U55 Fast Model.

The build parameters are:

- `TARGET_PLATFORM`: Target platform to execute application:
  - `mps3`
  - `native`
  - `simple_plaform`

- `TARGET_SUBSYSTEM`: Platform target subsystem; this specifies the
    design implementation for the deployment target. For both, the MPS3
    FVP and the MPS3 FPGA, this should be left to the default value of
    SSE-300:
  - `sse-300` (default - [Arm® Corstone™-300](https://developer.arm.com/ip-products/subsystem/corstone/corstone-300))
  - `sse-200`

- `TENSORFLOW_SRC_PATH`: Path to the root of the TensorFlow directory.
    The default value points to the TensorFlow submodule in the
    [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.

- `ETHOS_U55_DRIVER_SRC_PATH`: Path to the Ethos-U55 core driver sources.
    The default value points to the core_driver submodule in the
    [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.

- `CMSIS_SRC_PATH`: Path to the CMSIS sources to be used to build TensorFlow
    Lite Micro library. This parameters is optional and valid only for
    Arm® Cortex®-M CPU targeted configurations. The default value points to the CMSIS submodule in the
    [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.

- `ETHOS_U55_ENABLED`: Sets whether the use of Ethos-U55 is available for
    the deployment target. By default, this is set and therefore
    application is built with Ethos-U55 supported.

- `CPU_PROFILE_ENABLED`: Sets whether profiling information for the CPU
    core should be displayed. By default, this is set to false, but can
    be turned on for FPGA targets. The the FVP, the CPU core's cycle
    counts are not meaningful and should not be used.

- `LOG_LEVEL`: Sets the verbosity level for the application's output
    over UART/stdout. Valid values are `LOG_LEVEL_TRACE`, `LOG_LEVEL_DEBUG`,
    `LOG_LEVEL_INFO`, `LOG_LEVEL_WARN` and `LOG_LEVEL_ERROR`. By default, it
    is set to `LOG_LEVEL_INFO`.

- `<use_case>_MODEL_TFLITE_PATH`: Path to the model file that will be
    processed and included into the application axf file. The default
    value points to one of the delivered set of models. Make sure the
    model chosen is aligned with the `ETHOS_U55_ENABLED` setting.

  - When using Ethos-U55 backend, the NN model is assumed to be
    optimized by Vela compiler.
    However, even if not, it will fall back on the CPU and execute,
    if supported by TensorFlow Lite Micro.

  - When use of Ethos-U55 is disabled, and if a Vela optimized model
    is provided, the application will report a failure at runtime.

- `USE_CASE_BUILD`: specifies the list of applications to build. By
    default, the build system scans sources to identify available ML
    applications and produces executables for all detected use-cases.
    This parameter can accept single value, for example,
    `USE_CASE_BUILD=img_class` or multiple values, for example,
    `USE_CASE_BUILD="img_class;kws"`.

- `ETHOS_U55_TIMING_ADAPTER_SRC_PATH`: Path to timing adapter sources.
    The default value points to the `timing_adapter` dependencies folder.

- `TA_CONFIG_FILE`: Path to the CMake configuration file containing the
    timing adapter parameters. Used only if the timing adapter build is
    enabled.

- `TENSORFLOW_LITE_MICRO_CLEAN_BUILD`: Optional parameter to enable/disable
    "cleaning" prior to building for the TensorFlow Lite Micro library.
    It is enabled by default.

- `TENSORFLOW_LITE_MICRO_CLEAN_DOWNLOADS`: Optional parameter to enable wiping
    out TPIP downloads from TensorFlow source tree prior to each build.
    It is disabled by default.

- `ARMCLANG_DEBUG_DWARF_LEVEL`: When the CMake build type is specified as `Debug`
    and when armclang toolchain is being used to build for a Cortex-M CPU target,
    this optional argument can be set to specify the DWARF format.
    By default, this is set to 4 and is synonymous with passing `-g`
    flag to the compiler. This is compatible with Arm-DS and other tools
    which can interpret the latest DWARF format. To allow debugging using
    the Model Debugger from Arm FastModel Tools Suite, this argument can be used
    to pass DWARF format version as "3". Note: this option is only available
    when CMake project is configured with `-DCMAKE_BUILD_TYPE=Debug` argument.
    Also, the same dwarf format is used for building TensorFlow Lite Micro library.

> **Note:** For details on the specific use case build options, follow the
> instructions in the use-case specific documentation.
> Also, when setting any of the CMake configuration parameters that expect a directory/file path , it is advised
>to **use absolute paths instead of relative paths**.

## Build process

The build process can summarized in three major steps:

- Prepare the build environment by downloading third party sources required, see
[Preparing build environment](#preparing-build-environment).

- Configure the build for the platform chosen.
This stage includes:
  - CMake options configuration
  - When `<use_case>_MODEL_TFLITE_PATH` build options aren't provided, defaults neural network models are be downloaded
from [Arm ML-Zoo](https://github.com/ARM-software/ML-zoo/). In case of native build, network's input and output data
for tests are downloaded.
  - Some files such as neural network models, network's inputs and output labels are automatically converted
    into C/C++ arrays, see [Automatic file generation](#automatic-file-generation).

- Build the application.\
During this stage application and third party libraries are built see [Building the configured project](#building-the-configured-project).

### Preparing build environment

Certain third party sources are required to be present on the development machine to allow the example sources in this
repository to link against.

1. [TensorFlow Lite Micro repository](https://github.com/tensorflow/tensorflow)
2. [Ethos-U55 core driver repository](https://review.mlplatform.org/admin/repos/ml/ethos-u/ethos-u-core-driver)
3. [CMSIS-5](https://github.com/ARM-software/CMSIS_5.git)

These are part of the [ethos-u repository](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) and set as
submodules of this project.

To pull the submodules:

```sh
git submodule update --init
```

This will download all the required components and place them in a tree like:

```tree
dependencies
 └── ethos-u
     ├── cmsis
     ├── core_driver
     ├── tensorflow
     └── ...
```

> **NOTE**: The default source paths for the TPIP sources assume the above directory structure, but all of the relevant
>paths can be overridden by CMake configuration arguments `TENSORFLOW_SRC_PATH`, `ETHOS_U55_DRIVER_SRC_PATH`,
>and `CMSIS_SRC_PATH`.

### Create a build directory

Create a build directory in the root of the project and navigate inside:

```commandline
mkdir build && cd build
```

### Configuring the build for `MPS3: SSE-300`

On Linux, execute the following command to build the application to run
on the Ethos-U55 when providing only the mandatory arguments for CMake configuration:

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
```

For Windows, add `-G "MinGW Makefiles"`:

```commandline
cmake \
    -G "MinGW Makefiles" \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
```

Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific
file to set the compiler and platform specific parameters.

To configure a build that can be debugged using Arm-DS, we can just specify
the build type as `Debug`:

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
    -DCMAKE_BUILD_TYPE=Debug ..
```

To configure a build that can be debugged using a tool that only supports
DWARF format 3 (Modeldebugger for example), we can use:

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
    -DCMAKE_BUILD_TYPE=Debug \
    -DARMCLANG_DEBUG_DWARF_LEVEL=3 ..
```

If the TensorFlow source tree is not in its default expected location,
set the path using `TENSORFLOW_SRC_PATH`.
Similarly, if the Ethos-U55 driver and CMSIS are not in the default location,
`ETHOS_U55_DRIVER_SRC_PATH` and `CMSIS_SRC_PATH` can be used to configure their location. For example:

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
    -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
    -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
    -DCMSIS_SRC_PATH=/my/custom/location/cmsis ..
```

> **Note:** If re-building with changed parameters values, it is
highly advised to clean the build directory and re-run the CMake command.

### Configuring the build for `MPS3: SSE-200`

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-200 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
```

for Windows add `-G "MinGW Makefiles"`:

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-200 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
    -G "MinGW Makefiles ..
```

### Configuring the build native unit-test

```commandline
cmake \
    -DTARGET_PLATFORM=native \
    -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake ..
```

For Windows add `-G "MinGW Makefiles"`:

```commandline
cmake \
    -DTARGET_PLATFORM=native \
    -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake \
    -G "MinGW Makefiles ..
```

Results of the build will be placed under `build/bin/` folder:

```tree
 bin
  |- dev_ethosu_eval-tests
  |_ ethos-u
```

### Configuring the build for `simple_platform`

```commandline
cmake \
    -DTARGET_PLATFORM=simple_platform \
    -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake ..
```

For Windows add `-G "MinGW Makefiles"`:

```commandline
cmake \
    -DTARGET_PLATFORM=simple_platform \
    -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake \
    -G "MinGW Makefiles" ..
```

### Building the configured project

If the CMake command succeeds, build the application as follows:

```commandline
make -j4
```

or for Windows:

```commandline
mingw32-make -j4
```

Add `VERBOSE=1` to see compilation and link details.

Results of the build will be placed under `build/bin` folder, an
example:

```tree
bin
 ├── ethos-u-<use_case_name>.axf
 ├── ethos-u-<use_case_name>.htm
 ├── ethos-u-<use_case_name>.map
 ├── images-<use_case_name>.txt
 └── sectors
        └── <use_case>
                ├── dram.bin
                └── itcm.bin
```

Where for each implemented use-case under the `source/use-case` directory,
the following build artefacts will be created:

- `ethos-u-<use case name>.axf`: The built application binary for a ML
    use case.

- `ethos-u-<use case name>.map`: Information from building the
    application (e.g. libraries used, what was optimized, location of
    objects).

- `ethos-u-<use case name>.htm`: Human readable file containing the
    call graph of application functions.

- `sectors/`: Folder containing the built application, split into files
    for loading into different FPGA memory regions.

- `images-<use case name>.txt`: Tells the FPGA which memory regions to
    use for loading the binaries in sectors/** folder.

> **Note:**  For the specific use case commands see the relative section
in the use case documentation.

## Building timing adapter with custom options

The sources also contains the configuration for a timing adapter utility
for the Ethos-U55 driver. The timing adapter allows the platform to simulate user
provided memory bandwidth and latency constraints.

The timing adapter driver aims to control the behavior of two AXI buses
used by Ethos-U55. One is for SRAM memory region and the other is for
flash or DRAM. The SRAM is where intermediate buffers are expected to be
allocated and therefore, this region can serve frequent R/W traffic
generated by computation operations while executing a neural network
inference. The flash or DDR is where we expect to store the model
weights and therefore, this bus would typically be used only for R/O
traffic.

It is used for MPS3 FPGA as well as for Fast Model environment.

The CMake build framework allows the parameters to control the behavior
of each bus with following parameters:

- `MAXR`: Maximum number of pending read operations allowed. 0 is
    inferred as infinite, and the default value is 4.

- `MAXW`: Maximum number of pending write operations allowed. 0 is
    inferred as infinite, and the default value is 4.

- `MAXRW`: Maximum number of pending read+write operations allowed. 0 is
    inferred as infinite, and the default value is 8.

- `RLATENCY`: Minimum latency, in cycle counts, for a read operation.
    This is the duration between ARVALID and RVALID signals. The default
    value is 50.

- `WLATENCY`: Minimum latency, in cycle counts, for a write operation.
    This is the duration between WVALID + WLAST and BVALID being
    de-asserted. The default value is 50.

- `PULSE_ON`: Number of cycles during which addresses are let through.
    The default value is 5100.

- `PULSE_OFF`: Number of cycles during which addresses are blocked. The
    default value is 5100.

- `BWCAP`: Maximum number of 64-bit words transferred per pulse cycle. A
    pulse cycle is PULSE_ON + PULSE_OFF. 0 is inferred as infinite, and
    the default value is 625.

- `MODE`: Timing adapter operation mode. Default value is 0

  - Bit 0: 0=simple; 1=latency-deadline QoS throttling of read vs.
        write

  - Bit 1: 1=enable random AR reordering (0=default),

  - Bit 2: 1=enable random R reordering (0=default),

  - Bit 3: 1=enable random B reordering (0=default)

For timing adapter's CMake build configuration, the SRAM AXI is assigned
index 0 and the flash/DRAM AXI bus has index 1. To change the bus
parameter for the build a "***TA_\<index>_**"* prefix should be added
to the above. For example, **TA0_MAXR=10** will set the SRAM AXI bus's
maximum pending reads to 10.

As an example, if we have the following parameters for flash/DRAM
region:

- `TA1_MAXR` = "2"

- `TA1_MAXW` = "0"

- `TA1_MAXRW` = "0"

- `TA1_RLATENCY` = "64"

- `TA1_WLATENCY` = "32"

- `TA1_PULSE_ON` = "320"

- `TA1_PULSE_OFF` = "80"

- `TA1_BWCAP` = "50"

For a clock rate of 500MHz, this would translate to:

- The maximum duty cycle for any operation is:\
![Maximum duty cycle formula](../media/F1.png)

- Maximum bit rate for this bus (64-bit wide) is:\
![Maximum bit rate formula](../media/F2.png)

- With a read latency of 64 cycles, and maximum pending reads as 2,
    each read could be a maximum of 64 or 128 bytes, as defined for
    Ethos-U55\'s AXI bus\'s attribute.

    The bandwidth is calculated solely by read parameters ![Bandwidth formula](
        ../media/F3.png)

    This is higher than the overall bandwidth dictated by the bus parameters
    of \
    ![Overall bandwidth formula](../media/F4.png)

This suggests that the read operation is limited only by the overall bus
bandwidth.

Timing adapter requires recompilation to change parameters. Default timing
adapter configuration file pointed to by `TA_CONFIG_FILE` build parameter is
located in the scripts/cmake folder and contains all options for AXI0 and
AXI1 described above.

An example of scripts/cmake/ta_config.cmake:

```cmake
# Timing adapter options
set(TA_INTERACTIVE OFF)

# Timing adapter settings for AXI0
set(TA0_MAXR "8")
set(TA0_MAXW "8")
set(TA0_MAXRW "0")
set(TA0_RLATENCY "32")
set(TA0_WLATENCY "32")
set(TA0_PULSE_ON "3999")
set(TA0_PULSE_OFF "1")
set(TA0_BWCAP "4000")
...
```

An example of the build with custom timing adapter configuration:

```commandline
cmake \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
    -DTA_CONFIG_FILE=scripts/cmake/my_ta_config.cmake ..
```

## Add custom inputs

The application performs inference on input data found in the folder set
by the CMake parameters, for more information see the 3.3 section in the
specific use case documentation.

## Add custom model

The application performs inference using the model pointed to by the
CMake parameter `MODEL_TFLITE_PATH`.

> **Note:** If you want to run the model using Ethos-U55, ensure your custom
model has been run through the Vela compiler successfully before continuing.

To run the application with a custom model you will need to provide a
labels_<model_name>.txt file of labels associated with the model.
Each line of the file should correspond to one of the outputs in your
model. See the provided labels_mobilenet_v2_1.0_224.txt file in the
img_class use case for an example.

Then, you must set `<use_case>_MODEL_TFLITE_PATH` to the location of
the Vela processed model file and `<use_case>_LABELS_TXT_FILE` to the
location of the associated labels file:

```commandline
cmake \
    -D<use_case>_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
    -D<use_case>_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \
    -DTARGET_PLATFORM=mps3 \
    -DTARGET_SUBSYSTEM=sse-300 \
    -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
```

> **Note:** For the specific use case command see the relative section in the use case documentation.

For Windows, add `-G MinGW Makefiles` to the CMake command.

> **Note:** Clean the build directory before re-running the CMake command.

The TensorFlow Lite for Microcontrollers model pointed to by `<use_case>_MODEL_TFLITE_PATH` and
labels text file pointed to by `<use_case>_LABELS_TXT_FILE` will be
converted to C++ files during the CMake configuration stage and then
compiled into the application for performing inference with.

The log from the configuration stage should tell you what model path and
labels file have been used:

```log
-- User option TARGET_PLATFORM is set to mps3
-- User option <use_case>_MODEL_TFLITE_PATH is set to
<path/to/custom_model_after_vela.tflite>
...
-- User option <use_case>_LABELS_TXT_FILE is set to
<path/to/labels_custom_model.txt>
...
-- Using <path/to/custom_model_after_vela.tflite>
++ Converting custom_model_after_vela.tflite to custom_model_after_vela.tflite.cc
-- Generating labels file from <path/to/labels_custom_model.txt>
-- writing to <path/to/build>/generated/include/Labels.hpp and <path/to/build>/generated/src/Labels.cc
...
```

After compiling, your custom model will have now replaced the default
one in the application.

## Optimize custom model with Vela compiler

> **Note:** This tool is not available within this project.
It is a python tool available from <https://pypi.org/project/ethos-u-vela/>.
The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>.

The Vela compiler is a tool that can optimize a neural network model
into a version that can run on an embedded system containing Ethos-U55.

The optimized model will contain custom operators for sub-graphs of the
model that can be accelerated by Ethos-U55, the remaining layers that
cannot be accelerated are left unchanged and will run on the CPU using
optimized (CMSIS-NN) or reference kernels provided by the inference
engine.

After the compilation, the optimized model can only be executed on a
system with Ethos-U55.

> **Note:** The NN model provided during the build and compiled into the application
executable binary defines whether CPU or NPU is used to execute workloads.
If unoptimized model is used, then inference will run on Cortex-M CPU.

Vela compiler accepts parameters to influence a model optimization. The
model provided within this project has been optimized with
the following parameters:

```commandline
vela \
    --accelerator-config=ethos-u55-128 \
    --block-config-limit=0 \
    --config my_vela_cfg.ini \
    --memory-mode Shared_Sram \
    --system-config Ethos_U55_High_End_Embedded \
    <model>.tflite
```

Where:

- `--accelerator-config`: Specify the accelerator configuration to use
    between ethos-u55-256, ethos-u55-128, ethos-u55-64 and ethos-u55-32.
- `--block-config-limit`: Limit block config search space, use zero for
    unlimited.
- `--config`: Specifies the path to the Vela configuration file. The format of the file is a Python ConfigParser .ini file.
    An example can be found in the `dependencies` folder [vela.ini](../../scripts/vela/vela.ini).
- `--memory-mode`: Selects the memory mode to use as specified in the Vela configuration file.
- `--system-config`:Selects the system configuration to use as specified in the Vela configuration file.

Vela compiler accepts `.tflite` file as input and saves optimized network
model as a `.tflite` file.

Using `--show-cpu-operations` and `--show-subgraph-io-summary` will show
all the operations that fall back to the CPU and a summary of all the
subgraphs and their inputs and outputs.

To see Vela helper for all the parameters use: `vela --help`.

Please, get in touch with your Arm representative to request access to
Vela Compiler documentation for more details.

> **Note:** By default, use of the Ethos-U55 is enabled in the CMake configuration.
This could be changed by passing `-DETHOS_U55_ENABLED`.

## Memory constraints

Both the MPS3 Fixed Virtual Platform and the MPS3 FPGA platform share
the linker script (scatter file) for SSE-300 design. The design is set
by the CMake configuration parameter `TARGET_SUBSYSTEM` as described in
the previuous section.

The memory map exposed by this design is presented in Appendix 1. This
can be used as a reference when editing the scatter file, especially to
make sure that region boundaries are respected. The snippet from MPS3's
scatter file is presented below:

```
;---------------------------------------------------------
; First load region
;---------------------------------------------------------
LOAD_REGION_0 0x00000000 0x00080000
{
    ;-----------------------------------------------------
    ; First part of code mem -- 512kiB
    ;-----------------------------------------------------
    itcm.bin 0x00000000 0x00080000
    {
        *.o (RESET, +First)
        * (InRoot$$Sections)
        .ANY (+RO)
    }

    ;-----------------------------------------------------
    ; 128kiB of 512kiB bank is used for any other RW or ZI
    ; data. Note: this region is internal to the Cortex-M CPU
    ;-----------------------------------------------------
    dtcm.bin 0x20000000 0x00020000
    {
        .ANY(+RW +ZI)
    }

    ;-----------------------------------------------------
    ; 128kiB of stack space within the DTCM region
    ;-----------------------------------------------------
    ARM_LIB_STACK 0x20020000 EMPTY ALIGN 8 0x00020000
    {}

    ;-----------------------------------------------------
    ; 256kiB of heap space within the DTCM region
    ;-----------------------------------------------------

    ARM_LIB_HEAP 0x20040000 EMPTY ALIGN 8 0x00040000
    {}

    ;-----------------------------------------------------
    ; SSE-300's internal SRAM
    ;-----------------------------------------------------
    isram.bin 0x21000000 UNINIT ALIGN 16 0x00080000
    {
        ; activation buffers a.k.a tensor arena
        *.o (.bss.NoInit.activation_buf)
    }
}

;---------------------------------------------------------
; Second load region
;---------------------------------------------------------
LOAD_REGION_1 0x60000000 0x02000000
{
    ;-----------------------------------------------------
    ; 32 MiB of DRAM space for nn model and input vectors
    ;-----------------------------------------------------
    dram.bin 0x60000000 ALIGN 16 0x02000000
    {
        ; nn model's baked in input matrices
        *.o (ifm)

        ; nn model
        *.o (nn_model)

        ; if the activation buffer (tensor arena) doesn't
        ; fit in the SRAM region, we accommodate it here
        *.o (activation_buf)
    }
}
```

It is worth noting that in the bitfile implementation, only the BRAM,
internal SRAM and DDR memory regions are accessible to the Ethos-U55
block. In the above snippet, the internal SRAM region memory can be seen
to be utilized by activation buffers with a limit of 512kiB. If used,
this region will be written to by the Ethos-U55 block frequently. A bigger
region of memory for storing the model is placed in the DDR region,
under LOAD_REGION_1. The two load regions are necessary as the MPS3's
motherboard configuration controller limits the load size at address
0x00000000 to 512kiB. This has implications on how the application **is
deployed** on MPS3 as explained under the section 3.8.3.

## Automatic file generation

As mentioned in the previous sections, some files such as neural network
models, network's inputs, and output labels are automatically converted
into C/C++ arrays during the CMake project configuration stage.
Additionally, some code is generated to allow access to these arrays.

An example:

```log
-- Building use-cases: img_class.
-- Found sources for use-case img_class
-- User option img_class_FILE_PATH is set to /tmp/samples
-- User option img_class_IMAGE_SIZE is set to 224
-- User option img_class_LABELS_TXT_FILE is set to /tmp/labels/labels_model.txt
-- Generating image files from /tmp/samples
++ Converting cat.bmp to cat.cc
++ Converting dog.bmp to dog.cc
-- Skipping file /tmp/samples/files.md due to unsupported image format.
++ Converting kimono.bmp to kimono.cc
++ Converting tiger.bmp to tiger.cc
++ Generating /tmp/build/generated/img_class/include/InputFiles.hpp
-- Generating labels file from /tmp/labels/labels_model.txt
-- writing to /tmp/build/generated/img_class/include/Labels.hpp and /tmp/build/generated/img_class/src/Labels.cc
-- User option img_class_ACTIVATION_BUF_SZ is set to 0x00200000
-- User option img_class_MODEL_TFLITE_PATH is set to /tmp/models/model.tflite
-- Using /tmp/models/model.tflite
++ Converting model.tflite to    model.tflite.cc
...
```

In particular, the building options pointing to the input files `<use_case>_FILE_PATH`,
the model `<use_case>_MODEL_TFLITE_PATH` and labels text file `<use_case>_LABELS_TXT_FILE`
are used by python scripts in order to generate not only the converted array files,
but also some headers with utility functions.

For example, the generated utility functions for image classification are:

- `build/generated/include/InputFiles.hpp`

```c++
#ifndef GENERATED_IMAGES_H
#define GENERATED_IMAGES_H

#include <cstdint>

#define NUMBER_OF_FILES  (2U)
#define IMAGE_DATA_SIZE  (150528U)

extern const uint8_t im0[IMAGE_DATA_SIZE];
extern const uint8_t im1[IMAGE_DATA_SIZE];

const char* get_filename(const uint32_t idx);
const uint8_t* get_img_array(const uint32_t idx);

#endif /* GENERATED_IMAGES_H */
```

- `build/generated/src/InputFiles.cc`

```c++
#include "InputFiles.hpp"

static const char *img_filenames[] = {
    "img1.bmp",
    "img2.bmp",
};

static const uint8_t *img_arrays[] = {
    im0,
    im1
};

const char* get_filename(const uint32_t idx)
{
    if (idx < NUMBER_OF_FILES) {
        return img_filenames[idx];
    }
    return nullptr;
}

const uint8_t* get_img_array(const uint32_t idx)
{
    if (idx < NUMBER_OF_FILES) {
        return img_arrays[idx];
    }
    return nullptr;
}
```

These headers are generated using python templates, that are in `scripts/py/templates/*.template`.

```tree
scripts/
├── cmake
│   ├── ...
│   ├── subsystem-profiles
│   │   ├── corstone-sse-200.cmake
│   │   └── corstone-sse-300.cmake
│   ├── templates
│   │   ├── mem_regions.h.template
│   │   ├── peripheral_irqs.h.template
│   │   └── peripheral_memmap.h.template
│   └── ...
└── py
    ├── <generation scripts>
    ├── requirements.txt
    └── templates
        ├── audio.cc.template
        ├── AudioClips.cc.template
        ├── AudioClips.hpp.template
        ├── default.hpp.template
        ├── header_template.txt
        ├── image.cc.template
        ├── Images.cc.template
        ├── Images.hpp.template
        ├── Labels.cc.template
        ├── Labels.hpp.template
        ├── testdata.cc.template
        ├── TestData.cc.template
        ├── TestData.hpp.template
        └── tflite.cc.template
```

Based on the type of use case the correct conversion is called in the use case cmake file
(audio or image respectively for voice or vision use cases).
For example, the generations call for image classification (`source/use_case/img_class/usecase.cmake`):

```c++
# Generate input files
generate_images_code("${${use_case}_FILE_PATH}"
                     ${SRC_GEN_DIR}
                     ${INC_GEN_DIR}
                     "${${use_case}_IMAGE_SIZE}")

# Generate labels file
set(${use_case}_LABELS_CPP_FILE Labels)
generate_labels_code(
    INPUT           "${${use_case}_LABELS_TXT_FILE}"
    DESTINATION_SRC ${SRC_GEN_DIR}
    DESTINATION_HDR ${INC_GEN_DIR}
    OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}"
)

...

# Generate model file
generate_tflite_code(
    MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
    DESTINATION ${SRC_GEN_DIR}
)
```

> **Note:** When required, for models and labels conversion it's possible to add extra parameters such
> as extra code to put in `<model>.cc` file or namespaces.
>
> ```c++
> set(${use_case}_LABELS_CPP_FILE Labels)
> generate_labels_code(
>     INPUT           "${${use_case}_LABELS_TXT_FILE}"
>     DESTINATION_SRC ${SRC_GEN_DIR}
>     DESTINATION_HDR ${INC_GEN_DIR}
>     OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}"
>     NAMESPACE       "namespace1" "namespace2"
> )
>
> ...
>
> set(EXTRA_MODEL_CODE
>     "/* Model parameters for ${use_case} */"
>     "extern const int   g_myvariable2     = value1"
>     "extern const int   g_myvariable2     = value2"
> )
>
> generate_tflite_code(
>     MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
>     DESTINATION ${SRC_GEN_DIR}
>     EXPRESSIONS ${EXTRA_MODEL_CODE}
>     NAMESPACE   "namespace1" "namespace2"
> )
> ```

In addition to input file conversions, the correct platform/system profile is selected
(in `scripts/cmake/subsystem-profiles/*.cmake`) based on `TARGET_SUBSYSTEM` build option
and the variables set are used to generate memory region sizes, base addresses and IRQ numbers,
respectively used to generate mem_region.h, peripheral_irqs.h and peripheral_memmap.h headers.
Templates from `scripts/cmake/templates/*.template` are used to generate the header files.

After the build, the files generated in the build folder are:

```tree
build/generated/
├── bsp
│   ├── mem_regions.h
│   ├── peripheral_irqs.h
│   └── peripheral_memmap.h
├── <use_case_name1>
│   ├── include
│   │   ├── InputFiles.hpp
│   │   └── Labels.hpp
│   └── src
│       ├── <uc1_input_file1>.cc
│       ├── <uc1_input_file2>.cc
│       ├── InputFiles.cc
│       ├── Labels.cc
│       └── <uc1_model_name>.tflite.cc
└──  <use_case_name2>
    ├── include
    │   ├── InputFiles.hpp
    │   └── Labels.hpp
    └── src
        ├── <uc2_input_file1>.cc
        ├── <uc2_input_file2>.cc
        ├── InputFiles.cc
        ├── Labels.cc
        └── <uc2_model_name>.tflite.cc
```

Next section of the documentation: [Deployment](../documentation.md#Deployment).