summaryrefslogtreecommitdiff
path: root/docs/sections
diff options
context:
space:
mode:
Diffstat (limited to 'docs/sections')
-rw-r--r--docs/sections/appendix.md20
-rw-r--r--docs/sections/building.md1023
-rw-r--r--docs/sections/coding_guidelines.md323
-rw-r--r--docs/sections/customizing.md731
-rw-r--r--docs/sections/deployment.md281
-rw-r--r--docs/sections/run.md42
-rw-r--r--docs/sections/testing_benchmarking.md87
-rw-r--r--docs/sections/troubleshooting.md27
8 files changed, 2534 insertions, 0 deletions
diff --git a/docs/sections/appendix.md b/docs/sections/appendix.md
new file mode 100644
index 0000000..7b56faa
--- /dev/null
+++ b/docs/sections/appendix.md
@@ -0,0 +1,20 @@
+# Appendix
+
+## Arm® Cortex®-M55 Memory map overview for Corstone™-300 reference design
+
+The table below is the memory mapping information specific to the Arm® Cortex®-M55.
+
+| Name | Base address | Limit address | Size | IDAU | Remarks |
+|-------|--------------|---------------|-----------|------|-----------------------------------------------------------|
+| ITCM | 0x0000_0000 | 0x0007_FFFF | 512 kiB | NS | ITCM code region |
+| BRAM | 0x0100_0000 | 0x0120_0000 | 2 MiB | NS | FPGA data SRAM region |
+| DTCM | 0x2000_0000 | 0x2007_FFFF | 512 kiB | NS | 4 banks for 128 kiB each |
+| SRAM | 0x2100_0000 | 0x213F_FFFF | 4 MiB | NS | 2 banks of 2 MiB each as SSE-300 internal SRAM region |
+| DDR | 0x6000_0000 | 0x6FFF_FFFF | 256 MiB | NS | DDR memory region |
+| ITCM | 0x1000_0000 | 0x1007_FFFF | 512 kiB | S | ITCM code region |
+| BRAM | 0x1100_0000 | 0x1120_0000 | 2 MiB | S | FPGA data SRAM region |
+| DTCM | 0x3000_0000 | 0x3007_FFFF | 512 kiB | S | 4 banks for 128 kiB each |
+| SRAM | 0x3100_0000 | 0x313F_FFFF | 4 MiB | S | 2 banks of 2 MiB each as SSE-300 internal SRAM region |
+| DDR | 0x7000_0000 | 0x7FFF_FFFF | 256 MiB | S | DDR memory region |
+
+Default memory map can be found here: https://developer.arm.com/documentation/101051/0002/Memory-model/Memory-map \ No newline at end of file
diff --git a/docs/sections/building.md b/docs/sections/building.md
new file mode 100644
index 0000000..56771b8
--- /dev/null
+++ b/docs/sections/building.md
@@ -0,0 +1,1023 @@
+# Building the Code Samples application from sources
+
+## Contents
+
+- [Building the Code Samples application from sources](#building-the-code-samples-application-from-sources)
+ - [Contents](#contents)
+ - [Build prerequisites](#build-prerequisites)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Preparing build environment](#preparing-build-environment)
+ - [Create a build directory](#create-a-build-directory)
+ - [Configuring the build for `MPS3: SSE-300`](#configuring-the-build-for-mps3-sse-300)
+ - [Configuring the build for `MPS3: SSE-200`](#configuring-the-build-for-mps3-sse-200)
+ - [Configuring the build native unit-test](#configuring-the-build-native-unit-test)
+ - [Configuring the build for `simple_platform`](#configuring-the-build-for-simple_platform)
+ - [Building the configured project](#building-the-configured-project)
+ - [Building timing adapter with custom options](#building-timing-adapter-with-custom-options)
+ - [Add custom inputs](#add-custom-inputs)
+ - [Add custom model](#add-custom-model)
+ - [Optimize custom model with Vela compiler](#optimize-custom-model-with-vela-compiler)
+ - [Memory constraints](#memory-constraints)
+ - [Automatic file generation](#automatic-file-generation)
+
+This section assumes the use of an **x86 Linux** build machine.
+
+## Build prerequisites
+
+Before proceeding, please, make sure that the following prerequisites
+are fulfilled:
+
+- Arm Compiler version 6.14 or above is installed and available on the
+ path.
+
+ Test the compiler by running:
+
+ ```commandline
+ armclang -v
+ ```
+
+ ```log
+ Product: ARM Compiler 6.14 Professional
+ Component: ARM Compiler 6.14
+ ```
+
+ > **Note:** Add compiler to the path, if needed:
+ >
+ > `export PATH=/path/to/armclang/bin:$PATH`
+
+- Compiler license is configured correctly
+
+- CMake version 3.15 or above is installed and available on the path.
+ Test CMake by running:
+
+ ```commandline
+ cmake --version
+ ```
+
+ ```log
+ cmake version 3.16.2
+ ```
+
+ > **Note:** Add cmake to the path, if needed:
+ >
+ > `export PATH=/path/to/cmake/bin:$PATH`
+
+- Python 3.6 or above is installed. Test python version by running:
+
+ ```commandline
+ python3 --version
+ ```
+
+ ```log
+ Python 3.6.8
+ ```
+
+- Build system will create python virtual environment during the build
+ process. Please make sure that python virtual environment module is
+ installed:
+
+ ```commandline
+ python3 -m venv
+ ```
+
+- Make or MinGW make For Windows
+
+ ```commandline
+ make --version
+ ```
+
+ ```log
+ GNU Make 4.1
+
+ ...
+ ```
+
+ > **Note:** Add it to the path environment variable, if needed.
+
+- Access to the Internet to download the third party dependencies, specifically: TensorFlow Lite Micro, Arm Ethos-U55
+driver and CMSIS. Instructions for downloading these are listed under [preparing build environment](#preparing-build-environment).
+
+## Build options
+
+The project build system allows user to specify custom NN
+model (in `.tflite` format) or images and compile application binary from
+sources.
+
+The build system uses pre-built TensorFlow Lite for Microcontrollers
+library and Arm® Ethos™-U55 driver libraries from the delivery package.
+
+The build script is parameterized to support different options. Default
+values for build parameters will build the executable compatible with
+the Ethos-U55 Fast Model.
+
+The build parameters are:
+
+- `TARGET_PLATFORM`: Target platform to execute application:
+ - `mps3`
+ - `native`
+ - `simple_plaform`
+
+- `TARGET_SUBSYSTEM`: Platform target subsystem; this specifies the
+ design implementation for the deployment target. For both, the MPS3
+ FVP and the MPS3 FPGA, this should be left to the default value of
+ SSE-300:
+ - `sse-300` (default - [Arm® Corstone™-300](https://developer.arm.com/ip-products/subsystem/corstone/corstone-300))
+ - `sse-200`
+
+- `TENSORFLOW_SRC_PATH`: Path to the root of the TensorFlow directory.
+ The default value points to the TensorFlow submodule in the
+ [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.
+
+- `ETHOS_U55_DRIVER_SRC_PATH`: Path to the Ethos-U55 core driver sources.
+ The default value points to the core_driver submodule in the
+ [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.
+
+- `CMSIS_SRC_PATH`: Path to the CMSIS sources to be used to build TensorFlow
+ Lite Micro library. This parameters is optional and valid only for
+ Arm® Cortex®-M CPU targeted configurations. The default value points to the CMSIS submodule in the
+ [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.
+
+- `ETHOS_U55_ENABLED`: Sets whether the use of Ethos-U55 is available for
+ the deployment target. By default, this is set and therefore
+ application is built with Ethos-U55 supported.
+
+- `CPU_PROFILE_ENABLED`: Sets whether profiling information for the CPU
+ core should be displayed. By default, this is set to false, but can
+ be turned on for FPGA targets. The the FVP, the CPU core's cycle
+ counts are not meaningful and should not be used.
+
+- `LOG_LEVEL`: Sets the verbosity level for the application's output
+ over UART/stdout. Valid values are `LOG_LEVEL_TRACE`, `LOG_LEVEL_DEBUG`,
+ `LOG_LEVEL_INFO`, `LOG_LEVEL_WARN` and `LOG_LEVEL_ERROR`. By default, it
+ is set to `LOG_LEVEL_INFO`.
+
+- `<use_case>_MODEL_TFLITE_PATH`: Path to the model file that will be
+ processed and included into the application axf file. The default
+ value points to one of the delivered set of models. Make sure the
+ model chosen is aligned with the `ETHOS_U55_ENABLED` setting.
+
+ - When using Ethos-U55 backend, the NN model is assumed to be
+ optimized by Vela compiler.
+ However, even if not, it will fall back on the CPU and execute,
+ if supported by TensorFlow Lite Micro.
+
+ - When use of Ethos-U55 is disabled, and if a Vela optimized model
+ is provided, the application will report a failure at runtime.
+
+- `USE_CASE_BUILD`: specifies the list of applications to build. By
+ default, the build system scans sources to identify available ML
+ applications and produces executables for all detected use-cases.
+ This parameter can accept single value, for example,
+ `USE_CASE_BUILD=img_class` or multiple values, for example,
+ `USE_CASE_BUILD="img_class;kws"`.
+
+- `ETHOS_U55_TIMING_ADAPTER_SRC_PATH`: Path to timing adapter sources.
+ The default value points to the `timing_adapter` dependencies folder.
+
+- `TA_CONFIG_FILE`: Path to the CMake configuration file containing the
+ timing adapter parameters. Used only if the timing adapter build is
+ enabled.
+
+- `TENSORFLOW_LITE_MICRO_CLEAN_BUILD`: Optional parameter to enable/disable
+ "cleaning" prior to building for the TensorFlow Lite Micro library.
+ It is enabled by default.
+
+- `TENSORFLOW_LITE_MICRO_CLEAN_DOWNLOADS`: Optional parameter to enable wiping
+ out TPIP downloads from TensorFlow source tree prior to each build.
+ It is disabled by default.
+
+- `ARMCLANG_DEBUG_DWARF_LEVEL`: When the CMake build type is specified as `Debug`
+ and when armclang toolchain is being used to build for a Cortex-M CPU target,
+ this optional argument can be set to specify the DWARF format.
+ By default, this is set to 4 and is synonymous with passing `-g`
+ flag to the compiler. This is compatible with Arm-DS and other tools
+ which can interpret the latest DWARF format. To allow debugging using
+ the Model Debugger from Arm FastModel Tools Suite, this argument can be used
+ to pass DWARF format version as "3". Note: this option is only available
+ when CMake project is configured with `-DCMAKE_BUILD_TYPE=Debug` argument.
+ Also, the same dwarf format is used for building TensorFlow Lite Micro library.
+
+> **Note:** For details on the specific use case build options, follow the
+> instructions in the use-case specific documentation.
+> Also, when setting any of the CMake configuration parameters that expect a directory/file path , it is advised
+>to **use absolute paths instead of relative paths**.
+
+## Build process
+
+The build process can summarized in three major steps:
+
+- Prepare the build environment by downloading third party sources required, see
+[Preparing build environment](#preparing-build-environment).
+
+- Configure the build for the platform chosen.
+This stage includes:
+ - CMake options configuration
+ - When `<use_case>_MODEL_TFLITE_PATH` build options aren't provided, defaults neural network models are be downloaded
+from [Arm ML-Zoo](https://github.com/ARM-software/ML-zoo/). In case of native build, network's input and output data
+for tests are downloaded.
+ - Some files such as neural network models, network's inputs and output labels are automatically converted
+ into C/C++ arrays, see [Automatic file generation](#automatic-file-generation).
+
+- Build the application.\
+During this stage application and third party libraries are built see [Building the configured project](#building-the-configured-project).
+
+### Preparing build environment
+
+Certain third party sources are required to be present on the development machine to allow the example sources in this
+repository to link against.
+
+1. [TensorFlow Lite Micro repository](https://github.com/tensorflow/tensorflow)
+2. [Ethos-U55 core driver repository](https://review.mlplatform.org/admin/repos/ml/ethos-u/ethos-u-core-driver)
+3. [CMSIS-5](https://github.com/ARM-software/CMSIS_5.git)
+
+These are part of the [ethos-u repository](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) and set as
+submodules of this project.
+
+To pull the submodules:
+
+```sh
+git submodule update --init
+```
+
+This will download all the required components and place them in a tree like:
+
+```tree
+dependencies
+ └── ethos-u
+    ├── cmsis
+    ├── core_driver
+   ├── tensorflow
+ └── ...
+```
+
+> **NOTE**: The default source paths for the TPIP sources assume the above directory structure, but all of the relevant
+>paths can be overridden by CMake configuration arguments `TENSORFLOW_SRC_PATH`, `ETHOS_U55_DRIVER_SRC_PATH`,
+>and `CMSIS_SRC_PATH`.
+
+### Create a build directory
+
+Create a build directory in the root of the project and navigate inside:
+
+```commandline
+mkdir build && cd build
+```
+
+### Configuring the build for `MPS3: SSE-300`
+
+On Linux, execute the following command to build the application to run
+on the Ethos-U55 when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific
+file to set the compiler and platform specific parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 ..
+```
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver and CMSIS are not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` and `CMSIS_SRC_PATH` can be used to configure their location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DCMSIS_SRC_PATH=/my/custom/location/cmsis ..
+```
+
+> **Note:** If re-building with changed parameters values, it is
+highly advised to clean the build directory and re-run the CMake command.
+
+### Configuring the build for `MPS3: SSE-200`
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-200 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+for Windows add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-200 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -G "MinGW Makefiles ..
+```
+
+### Configuring the build native unit-test
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=native \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake ..
+```
+
+For Windows add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=native \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake \
+ -G "MinGW Makefiles ..
+```
+
+Results of the build will be placed under `build/bin/` folder:
+
+```tree
+ bin
+ |- dev_ethosu_eval-tests
+ |_ ethos-u
+```
+
+### Configuring the build for `simple_platform`
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=simple_platform \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=simple_platform \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake \
+ -G "MinGW Makefiles" ..
+```
+
+### Building the configured project
+
+If the CMake command succeeds, build the application as follows:
+
+```commandline
+make -j4
+```
+
+or for Windows:
+
+```commandline
+mingw32-make -j4
+```
+
+Add `VERBOSE=1` to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder, an
+example:
+
+```tree
+bin
+ ├── ethos-u-<use_case_name>.axf
+ ├── ethos-u-<use_case_name>.htm
+ ├── ethos-u-<use_case_name>.map
+ ├── images-<use_case_name>.txt
+ └── sectors
+ └── <use_case>
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where for each implemented use-case under the `source/use-case` directory,
+the following build artefacts will be created:
+
+- `ethos-u-<use case name>.axf`: The built application binary for a ML
+ use case.
+
+- `ethos-u-<use case name>.map`: Information from building the
+ application (e.g. libraries used, what was optimized, location of
+ objects).
+
+- `ethos-u-<use case name>.htm`: Human readable file containing the
+ call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files
+ for loading into different FPGA memory regions.
+
+- `images-<use case name>.txt`: Tells the FPGA which memory regions to
+ use for loading the binaries in sectors/** folder.
+
+> **Note:** For the specific use case commands see the relative section
+in the use case documentation.
+
+## Building timing adapter with custom options
+
+The sources also contains the configuration for a timing adapter utility
+for the Ethos-U55 driver. The timing adapter allows the platform to simulate user
+provided memory bandwidth and latency constraints.
+
+The timing adapter driver aims to control the behavior of two AXI buses
+used by Ethos-U55. One is for SRAM memory region and the other is for
+flash or DRAM. The SRAM is where intermediate buffers are expected to be
+allocated and therefore, this region can serve frequent R/W traffic
+generated by computation operations while executing a neural network
+inference. The flash or DDR is where we expect to store the model
+weights and therefore, this bus would typically be used only for R/O
+traffic.
+
+It is used for MPS3 FPGA as well as for Fast Model environment.
+
+The CMake build framework allows the parameters to control the behavior
+of each bus with following parameters:
+
+- `MAXR`: Maximum number of pending read operations allowed. 0 is
+ inferred as infinite, and the default value is 4.
+
+- `MAXW`: Maximum number of pending write operations allowed. 0 is
+ inferred as infinite, and the default value is 4.
+
+- `MAXRW`: Maximum number of pending read+write operations allowed. 0 is
+ inferred as infinite, and the default value is 8.
+
+- `RLATENCY`: Minimum latency, in cycle counts, for a read operation.
+ This is the duration between ARVALID and RVALID signals. The default
+ value is 50.
+
+- `WLATENCY`: Minimum latency, in cycle counts, for a write operation.
+ This is the duration between WVALID + WLAST and BVALID being
+ de-asserted. The default value is 50.
+
+- `PULSE_ON`: Number of cycles during which addresses are let through.
+ The default value is 5100.
+
+- `PULSE_OFF`: Number of cycles during which addresses are blocked. The
+ default value is 5100.
+
+- `BWCAP`: Maximum number of 64-bit words transferred per pulse cycle. A
+ pulse cycle is PULSE_ON + PULSE_OFF. 0 is inferred as infinite, and
+ the default value is 625.
+
+- `MODE`: Timing adapter operation mode. Default value is 0
+
+ - Bit 0: 0=simple; 1=latency-deadline QoS throttling of read vs.
+ write
+
+ - Bit 1: 1=enable random AR reordering (0=default),
+
+ - Bit 2: 1=enable random R reordering (0=default),
+
+ - Bit 3: 1=enable random B reordering (0=default)
+
+For timing adapter's CMake build configuration, the SRAM AXI is assigned
+index 0 and the flash/DRAM AXI bus has index 1. To change the bus
+parameter for the build a "***TA_\<index>_**"* prefix should be added
+to the above. For example, **TA0_MAXR=10** will set the SRAM AXI bus's
+maximum pending reads to 10.
+
+As an example, if we have the following parameters for flash/DRAM
+region:
+
+- `TA1_MAXR` = "2"
+
+- `TA1_MAXW` = "0"
+
+- `TA1_MAXRW` = "0"
+
+- `TA1_RLATENCY` = "64"
+
+- `TA1_WLATENCY` = "32"
+
+- `TA1_PULSE_ON` = "320"
+
+- `TA1_PULSE_OFF` = "80"
+
+- `TA1_BWCAP` = "50"
+
+For a clock rate of 500MHz, this would translate to:
+
+- The maximum duty cycle for any operation is:\
+![Maximum duty cycle formula](../media/F1.png)
+
+- Maximum bit rate for this bus (64-bit wide) is:\
+![Maximum bit rate formula](../media/F2.png)
+
+- With a read latency of 64 cycles, and maximum pending reads as 2,
+ each read could be a maximum of 64 or 128 bytes, as defined for
+ Ethos-U55\'s AXI bus\'s attribute.
+
+ The bandwidth is calculated solely by read parameters ![Bandwidth formula](
+ ../media/F3.png)
+
+ This is higher than the overall bandwidth dictated by the bus parameters
+ of \
+ ![Overall bandwidth formula](../media/F4.png)
+
+This suggests that the read operation is limited only by the overall bus
+bandwidth.
+
+Timing adapter requires recompilation to change parameters. Default timing
+adapter configuration file pointed to by `TA_CONFIG_FILE` build parameter is
+located in the scripts/cmake folder and contains all options for AXI0 and
+AXI1 described above.
+
+An example of scripts/cmake/ta_config.cmake:
+
+```cmake
+# Timing adapter options
+set(TA_INTERACTIVE OFF)
+
+# Timing adapter settings for AXI0
+set(TA0_MAXR "8")
+set(TA0_MAXW "8")
+set(TA0_MAXRW "0")
+set(TA0_RLATENCY "32")
+set(TA0_WLATENCY "32")
+set(TA0_PULSE_ON "3999")
+set(TA0_PULSE_OFF "1")
+set(TA0_BWCAP "4000")
+...
+```
+
+An example of the build with custom timing adapter configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTA_CONFIG_FILE=scripts/cmake/my_ta_config.cmake ..
+```
+
+## Add custom inputs
+
+The application performs inference on input data found in the folder set
+by the CMake parameters, for more information see the 3.3 section in the
+specific use case documentation.
+
+## Add custom model
+
+The application performs inference using the model pointed to by the
+CMake parameter `MODEL_TFLITE_PATH`.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom
+model has been run through the Vela compiler successfully before continuing.
+
+To run the application with a custom model you will need to provide a
+labels_<model_name>.txt file of labels associated with the model.
+Each line of the file should correspond to one of the outputs in your
+model. See the provided labels_mobilenet_v2_1.0_224.txt file in the
+img_class use case for an example.
+
+Then, you must set `<use_case>_MODEL_TFLITE_PATH` to the location of
+the Vela processed model file and `<use_case>_LABELS_TXT_FILE` to the
+location of the associated labels file:
+
+```commandline
+cmake \
+ -D<use_case>_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
+ -D<use_case>_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+> **Note:** For the specific use case command see the relative section in the use case documentation.
+
+For Windows, add `-G MinGW Makefiles` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The TensorFlow Lite for Microcontrollers model pointed to by `<use_case>_MODEL_TFLITE_PATH` and
+labels text file pointed to by `<use_case>_LABELS_TXT_FILE` will be
+converted to C++ files during the CMake configuration stage and then
+compiled into the application for performing inference with.
+
+The log from the configuration stage should tell you what model path and
+labels file have been used:
+
+```log
+-- User option TARGET_PLATFORM is set to mps3
+-- User option <use_case>_MODEL_TFLITE_PATH is set to
+<path/to/custom_model_after_vela.tflite>
+...
+-- User option <use_case>_LABELS_TXT_FILE is set to
+<path/to/labels_custom_model.txt>
+...
+-- Using <path/to/custom_model_after_vela.tflite>
+++ Converting custom_model_after_vela.tflite to custom_model_after_vela.tflite.cc
+-- Generating labels file from <path/to/labels_custom_model.txt>
+-- writing to <path/to/build>/generated/include/Labels.hpp and <path/to/build>/generated/src/Labels.cc
+...
+```
+
+After compiling, your custom model will have now replaced the default
+one in the application.
+
+## Optimize custom model with Vela compiler
+
+> **Note:** This tool is not available within this project.
+It is a python tool available from <https://pypi.org/project/ethos-u-vela/>.
+The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>.
+
+The Vela compiler is a tool that can optimize a neural network model
+into a version that can run on an embedded system containing Ethos-U55.
+
+The optimized model will contain custom operators for sub-graphs of the
+model that can be accelerated by Ethos-U55, the remaining layers that
+cannot be accelerated are left unchanged and will run on the CPU using
+optimized (CMSIS-NN) or reference kernels provided by the inference
+engine.
+
+After the compilation, the optimized model can only be executed on a
+system with Ethos-U55.
+
+> **Note:** The NN model provided during the build and compiled into the application
+executable binary defines whether CPU or NPU is used to execute workloads.
+If unoptimized model is used, then inference will run on Cortex-M CPU.
+
+Vela compiler accepts parameters to influence a model optimization. The
+model provided within this project has been optimized with
+the following parameters:
+
+```commandline
+vela \
+ --accelerator-config=ethos-u55-128 \
+ --block-config-limit=0 \
+ --config my_vela_cfg.ini \
+ --memory-mode Shared_Sram \
+ --system-config Ethos_U55_High_End_Embedded \
+ <model>.tflite
+```
+
+Where:
+
+- `--accelerator-config`: Specify the accelerator configuration to use
+ between ethos-u55-256, ethos-u55-128, ethos-u55-64 and ethos-u55-32.
+- `--block-config-limit`: Limit block config search space, use zero for
+ unlimited.
+- `--config`: Specifies the path to the Vela configuration file. The format of the file is a Python ConfigParser .ini file.
+ An example can be found in the `dependencies` folder [vela.ini](../../scripts/vela/vela.ini).
+- `--memory-mode`: Selects the memory mode to use as specified in the Vela configuration file.
+- `--system-config`:Selects the system configuration to use as specified in the Vela configuration file.
+
+Vela compiler accepts `.tflite` file as input and saves optimized network
+model as a `.tflite` file.
+
+Using `--show-cpu-operations` and `--show-subgraph-io-summary` will show
+all the operations that fall back to the CPU and a summary of all the
+subgraphs and their inputs and outputs.
+
+To see Vela helper for all the parameters use: `vela --help`.
+
+Please, get in touch with your Arm representative to request access to
+Vela Compiler documentation for more details.
+
+> **Note:** By default, use of the Ethos-U55 is enabled in the CMake configuration.
+This could be changed by passing `-DETHOS_U55_ENABLED`.
+
+## Memory constraints
+
+Both the MPS3 Fixed Virtual Platform and the MPS3 FPGA platform share
+the linker script (scatter file) for SSE-300 design. The design is set
+by the CMake configuration parameter `TARGET_SUBSYSTEM` as described in
+the previuous section.
+
+The memory map exposed by this design is presented in Appendix 1. This
+can be used as a reference when editing the scatter file, especially to
+make sure that region boundaries are respected. The snippet from MPS3's
+scatter file is presented below:
+
+```
+;---------------------------------------------------------
+; First load region
+;---------------------------------------------------------
+LOAD_REGION_0 0x00000000 0x00080000
+{
+ ;-----------------------------------------------------
+ ; First part of code mem -- 512kiB
+ ;-----------------------------------------------------
+ itcm.bin 0x00000000 0x00080000
+ {
+ *.o (RESET, +First)
+ * (InRoot$$Sections)
+ .ANY (+RO)
+ }
+
+ ;-----------------------------------------------------
+ ; 128kiB of 512kiB bank is used for any other RW or ZI
+ ; data. Note: this region is internal to the Cortex-M CPU
+ ;-----------------------------------------------------
+ dtcm.bin 0x20000000 0x00020000
+ {
+ .ANY(+RW +ZI)
+ }
+
+ ;-----------------------------------------------------
+ ; 128kiB of stack space within the DTCM region
+ ;-----------------------------------------------------
+ ARM_LIB_STACK 0x20020000 EMPTY ALIGN 8 0x00020000
+ {}
+
+ ;-----------------------------------------------------
+ ; 256kiB of heap space within the DTCM region
+ ;-----------------------------------------------------
+
+ ARM_LIB_HEAP 0x20040000 EMPTY ALIGN 8 0x00040000
+ {}
+
+ ;-----------------------------------------------------
+ ; SSE-300's internal SRAM
+ ;-----------------------------------------------------
+ isram.bin 0x21000000 UNINIT ALIGN 16 0x00080000
+ {
+ ; activation buffers a.k.a tensor arena
+ *.o (.bss.NoInit.activation_buf)
+ }
+}
+
+;---------------------------------------------------------
+; Second load region
+;---------------------------------------------------------
+LOAD_REGION_1 0x60000000 0x02000000
+{
+ ;-----------------------------------------------------
+ ; 32 MiB of DRAM space for nn model and input vectors
+ ;-----------------------------------------------------
+ dram.bin 0x60000000 ALIGN 16 0x02000000
+ {
+ ; nn model's baked in input matrices
+ *.o (ifm)
+
+ ; nn model
+ *.o (nn_model)
+
+ ; if the activation buffer (tensor arena) doesn't
+ ; fit in the SRAM region, we accommodate it here
+ *.o (activation_buf)
+ }
+}
+```
+
+It is worth noting that in the bitfile implementation, only the BRAM,
+internal SRAM and DDR memory regions are accessible to the Ethos-U55
+block. In the above snippet, the internal SRAM region memory can be seen
+to be utilized by activation buffers with a limit of 512kiB. If used,
+this region will be written to by the Ethos-U55 block frequently. A bigger
+region of memory for storing the model is placed in the DDR region,
+under LOAD_REGION_1. The two load regions are necessary as the MPS3's
+motherboard configuration controller limits the load size at address
+0x00000000 to 512kiB. This has implications on how the application **is
+deployed** on MPS3 as explained under the section 3.8.3.
+
+## Automatic file generation
+
+As mentioned in the previous sections, some files such as neural network
+models, network's inputs, and output labels are automatically converted
+into C/C++ arrays during the CMake project configuration stage.
+Additionally, some code is generated to allow access to these arrays.
+
+An example:
+
+```log
+-- Building use-cases: img_class.
+-- Found sources for use-case img_class
+-- User option img_class_FILE_PATH is set to /tmp/samples
+-- User option img_class_IMAGE_SIZE is set to 224
+-- User option img_class_LABELS_TXT_FILE is set to /tmp/labels/labels_model.txt
+-- Generating image files from /tmp/samples
+++ Converting cat.bmp to cat.cc
+++ Converting dog.bmp to dog.cc
+-- Skipping file /tmp/samples/files.md due to unsupported image format.
+++ Converting kimono.bmp to kimono.cc
+++ Converting tiger.bmp to tiger.cc
+++ Generating /tmp/build/generated/img_class/include/InputFiles.hpp
+-- Generating labels file from /tmp/labels/labels_model.txt
+-- writing to /tmp/build/generated/img_class/include/Labels.hpp and /tmp/build/generated/img_class/src/Labels.cc
+-- User option img_class_ACTIVATION_BUF_SZ is set to 0x00200000
+-- User option img_class_MODEL_TFLITE_PATH is set to /tmp/models/model.tflite
+-- Using /tmp/models/model.tflite
+++ Converting model.tflite to model.tflite.cc
+...
+```
+
+In particular, the building options pointing to the input files `<use_case>_FILE_PATH`,
+the model `<use_case>_MODEL_TFLITE_PATH` and labels text file `<use_case>_LABELS_TXT_FILE`
+are used by python scripts in order to generate not only the converted array files,
+but also some headers with utility functions.
+
+For example, the generated utility functions for image classification are:
+
+- `build/generated/include/InputFiles.hpp`
+
+```c++
+#ifndef GENERATED_IMAGES_H
+#define GENERATED_IMAGES_H
+
+#include <cstdint>
+
+#define NUMBER_OF_FILES (2U)
+#define IMAGE_DATA_SIZE (150528U)
+
+extern const uint8_t im0[IMAGE_DATA_SIZE];
+extern const uint8_t im1[IMAGE_DATA_SIZE];
+
+const char* get_filename(const uint32_t idx);
+const uint8_t* get_img_array(const uint32_t idx);
+
+#endif /* GENERATED_IMAGES_H */
+```
+
+- `build/generated/src/InputFiles.cc`
+
+```c++
+#include "InputFiles.hpp"
+
+static const char *img_filenames[] = {
+ "img1.bmp",
+ "img2.bmp",
+};
+
+static const uint8_t *img_arrays[] = {
+ im0,
+ im1
+};
+
+const char* get_filename(const uint32_t idx)
+{
+ if (idx < NUMBER_OF_FILES) {
+ return img_filenames[idx];
+ }
+ return nullptr;
+}
+
+const uint8_t* get_img_array(const uint32_t idx)
+{
+ if (idx < NUMBER_OF_FILES) {
+ return img_arrays[idx];
+ }
+ return nullptr;
+}
+```
+
+These headers are generated using python templates, that are in `scripts/py/templates/*.template`.
+
+```tree
+scripts/
+├── cmake
+│ ├── ...
+│ ├── subsystem-profiles
+│ │ ├── corstone-sse-200.cmake
+│ │ └── corstone-sse-300.cmake
+│ ├── templates
+│ │ ├── mem_regions.h.template
+│ │ ├── peripheral_irqs.h.template
+│ │ └── peripheral_memmap.h.template
+│ └── ...
+└── py
+ ├── <generation scripts>
+ ├── requirements.txt
+ └── templates
+ ├── audio.cc.template
+ ├── AudioClips.cc.template
+ ├── AudioClips.hpp.template
+ ├── default.hpp.template
+ ├── header_template.txt
+ ├── image.cc.template
+ ├── Images.cc.template
+ ├── Images.hpp.template
+ ├── Labels.cc.template
+ ├── Labels.hpp.template
+ ├── testdata.cc.template
+ ├── TestData.cc.template
+ ├── TestData.hpp.template
+ └── tflite.cc.template
+```
+
+Based on the type of use case the correct conversion is called in the use case cmake file
+(audio or image respectively for voice or vision use cases).
+For example, the generations call for image classification (`source/use_case/img_class/usecase.cmake`):
+
+```c++
+# Generate input files
+generate_images_code("${${use_case}_FILE_PATH}"
+ ${SRC_GEN_DIR}
+ ${INC_GEN_DIR}
+ "${${use_case}_IMAGE_SIZE}")
+
+# Generate labels file
+set(${use_case}_LABELS_CPP_FILE Labels)
+generate_labels_code(
+ INPUT "${${use_case}_LABELS_TXT_FILE}"
+ DESTINATION_SRC ${SRC_GEN_DIR}
+ DESTINATION_HDR ${INC_GEN_DIR}
+ OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}"
+)
+
+...
+
+# Generate model file
+generate_tflite_code(
+ MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
+ DESTINATION ${SRC_GEN_DIR}
+)
+```
+
+> **Note:** When required, for models and labels conversion it's possible to add extra parameters such
+> as extra code to put in `<model>.cc` file or namespaces.
+>
+> ```c++
+> set(${use_case}_LABELS_CPP_FILE Labels)
+> generate_labels_code(
+> INPUT "${${use_case}_LABELS_TXT_FILE}"
+> DESTINATION_SRC ${SRC_GEN_DIR}
+> DESTINATION_HDR ${INC_GEN_DIR}
+> OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}"
+> NAMESPACE "namespace1" "namespace2"
+> )
+>
+> ...
+>
+> set(EXTRA_MODEL_CODE
+> "/* Model parameters for ${use_case} */"
+> "extern const int g_myvariable2 = value1"
+> "extern const int g_myvariable2 = value2"
+> )
+>
+> generate_tflite_code(
+> MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
+> DESTINATION ${SRC_GEN_DIR}
+> EXPRESSIONS ${EXTRA_MODEL_CODE}
+> NAMESPACE "namespace1" "namespace2"
+> )
+> ```
+
+In addition to input file conversions, the correct platform/system profile is selected
+(in `scripts/cmake/subsystem-profiles/*.cmake`) based on `TARGET_SUBSYSTEM` build option
+and the variables set are used to generate memory region sizes, base addresses and IRQ numbers,
+respectively used to generate mem_region.h, peripheral_irqs.h and peripheral_memmap.h headers.
+Templates from `scripts/cmake/templates/*.template` are used to generate the header files.
+
+After the build, the files generated in the build folder are:
+
+```tree
+build/generated/
+├── bsp
+│ ├── mem_regions.h
+│ ├── peripheral_irqs.h
+│ └── peripheral_memmap.h
+├── <use_case_name1>
+│ ├── include
+│ │ ├── InputFiles.hpp
+│ │ └── Labels.hpp
+│ └── src
+│ ├── <uc1_input_file1>.cc
+│ ├── <uc1_input_file2>.cc
+│ ├── InputFiles.cc
+│ ├── Labels.cc
+│ └── <uc1_model_name>.tflite.cc
+└── <use_case_name2>
+ ├── include
+ │ ├── InputFiles.hpp
+ │ └── Labels.hpp
+ └── src
+ ├── <uc2_input_file1>.cc
+ ├── <uc2_input_file2>.cc
+ ├── InputFiles.cc
+ ├── Labels.cc
+ └── <uc2_model_name>.tflite.cc
+```
+
+Next section of the documentation: [Deployment](../documentation.md#Deployment).
diff --git a/docs/sections/coding_guidelines.md b/docs/sections/coding_guidelines.md
new file mode 100644
index 0000000..f1813d3
--- /dev/null
+++ b/docs/sections/coding_guidelines.md
@@ -0,0 +1,323 @@
+# Coding standards and guidelines
+
+## Contents
+
+- [Introduction](#introduction)
+- [Language version](#language-version)
+- [File naming](#file-naming)
+- [File layout](#file-layout)
+- [Block Management](#block-management)
+- [Naming Conventions](#naming-conventions)
+ - [C++ language naming conventions](#c_language-naming-conventions)
+ - [C language naming conventions](#c-language-naming-conventions)
+- [Layout and formatting conventions](#layout-and-formatting-conventions)
+- [Language usage](#language-usage)
+
+## Introduction
+
+This document presents some standard coding guidelines to be followed for contributions to this repository. Most of the
+code is written in C++, but there is some written in C as well. There is a clear C/C++ boundary at the Hardware
+Abstraction Layer (HAL). Both these languages follow different naming conventions within this repository, by design, to:
+
+- have clearly distinguishable C and C++ sources.
+- make cross language function calls stand out. Mostly these will be C++ function calls to the HAL functions written in C.
+However, because we also issue function calls to third party API's (and they may not follow these conventions), the
+intended outcome may not be fully realised in all of the cases.
+
+## Language version
+
+For this project, code written in C++ shall use a subset of the C++11 feature set and software
+may be written using the C++11 language standard. Code written in C should be compatible
+with the C99 standard.
+
+Software components written in C/C++ may use the language features allowed and encouraged by this documentation.
+
+## File naming
+
+- C files should have `.c` extension
+- C++ files should have `.cc` or `.cpp` extension.
+- Header files for functions implemented in C should have `.h` extension.
+- Header files for functions implemented in C++ should have `.hpp` extension.
+
+## File layout
+
+- Standard copyright notice must be included in all files:
+
+ ```copyright
+ /*
+ * Copyright (c) <years additions were made to project> <your name>, Arm Limited. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ ```
+
+- Source lines must be no longer than 120 characters. Prefer to spread code out vertically rather than horizontally,
+ wherever it makes sense:
+
+ ```C++
+ # This is significantly easier to read
+ enum class SomeEnum1
+ {
+ ENUM_VALUE_1,
+ ENUM_VALUE_2,
+ ENUM_VALUE_3
+ };
+
+ # than this
+ enum class SomeEnum2 { ENUM_VALUE_1, ENUM_VALUE_2, ENUM_VALUE_3 };
+ ```
+
+- Block indentation should use 4 characters, no tabs.
+
+- Each statement must be on a separate line.
+
+ ```C++
+ int a, b; // Error prone
+ int c, *d;
+
+ int e = 0; // GOOD
+ int *p = nullptr; // GOOD
+ ```
+
+- Source must not contain commented out code or unreachable code
+
+## Block Management
+
+- Blocks must use braces and braces location must be consistent.
+ - Each function has its opening brace at the next line on the same indentation level as its header, the code within
+ the braces is indented and the closing brace at the end is on the same level as the opening.
+ For compactness, if the class/function body is empty braces are accepted on the same line.
+
+ - Conditional statements and loops, even if are just single-statement body, needs to be surrounded by braces, the
+opening brace is at the same line, the closing brace is at the next line on the same indentation level as its header;
+the same rule is applied to classes.
+
+ ```C++
+ class Class1 {
+ public:
+ Class1();
+ private:
+ int element;
+ };
+
+ void NotEmptyFunction()
+ {
+ if (condition) {
+ // [...]
+ } else {
+ // [...]
+ }
+ // [...]
+ for(start_cond; end_cond; step_cond) {
+ // [...]
+ }
+ }
+
+ void EmptyFunction() {}
+ ```
+
+ - Cases within switch are indented and enclosed in brackets:
+
+ ```C++
+ switch (option)
+ {
+ case 1:
+ {
+ // handle option 1
+ break;
+ }
+ case 2:
+ {
+ // handle option 2
+ break;
+ }
+ default:
+ {
+ break;
+ }
+ }
+ ```
+
+## Naming Conventions
+
+### C++ language naming conventions
+
+- Type (class, struct, enum) names must be `PascalCase`:
+
+ ```C++
+ class SomeClass
+ {
+ // [...]
+ };
+ void SomeFunction()
+ {
+ // [...]
+ }
+ ```
+
+- Variables and parameter names must be `camelCase`:
+
+ ```C++
+ int someVariable;
+
+ void SomeFunction(int someParameter) {}
+ ```
+
+- Macros, pre-processor definitions, and enumeration values should use upper case names:
+
+ ```C++
+ #define SOME_DEFINE
+
+ enum class SomeEnum
+ {
+ ENUM_VALUE_1,
+ ENUM_VALUE_2
+ };
+ ```
+
+- Namespace names must be lower case
+
+ ```C++
+ namespace nspace
+ {
+ void FunctionInNamespace();
+ };
+ ```
+
+- Source code should use Hungarian notation to annotate the name of a variable with information about its meaning.
+
+ | Prefix | Class | Description |
+ | ------ | ----- | ----------- |
+ | p | Type | Pointer to any other type |
+ | k | Qualifier | Constant |
+ | v | Qualifier | Volatile |
+ | m | Scope | Member of a class or struct |
+ | s | Scope | Static |
+ | g | Scope | Used to indicate variable has scope beyond the current function: file-scope or externally visible scope|
+
+The following examples of Hungarian notation are one possible set of uses:
+
+ ```C++
+ int g_GlobalInt=123;
+ char* m_pNameOfMemberPointer=nullptr;
+ const float g_kSomeGlobalConstant = 1.234f;
+ static float ms_MyStaticMember = 4.321f;
+ bool myLocalVariable=true;
+ ```
+
+### C language naming conventions
+
+For C sources, we follow the Linux variant of the K&R style wherever possible.
+
+- For function and variable names we use `snake_case` convention:
+
+ ```C
+ int some_variable;
+
+ void some_function(int some_parameter) {}
+ ```
+
+- Macros, pre-processor definitions, and enumeration values should use upper case names:
+
+ ```C
+ #define SOME_DEFINE
+
+ enum some_enum
+ {
+ ENUM_VALUE_1,
+ ENUM_VALUE_2
+ };
+ ```
+
+## Layout and formatting conventions
+
+- C++ class code layout
+ Public function definitions should be at the top of a class definition, since they are things most likely to be used
+by other people.
+ Private functions and member variables should be last.
+ Class functions and member variables should be laid out logically in blocks of related functionality.
+
+- Class inheritance keywords are not indented.
+
+ ```C++
+ class MyClass
+ {
+ public:
+ int m_PublicMember;
+ protected:
+ int m_ProtectedMember;
+ private:
+ int m_PrivateMember;
+ };
+ ```
+
+- Don't leave trailing spaces at the end of lines.
+
+- Empty lines should have no trailing spaces.
+
+- For pointers and references, the symbols `*` and `&` should be adjacent to the name of the type, not the name
+ of the variable.
+
+ ```C++
+ char* someText = "abc";
+
+ void SomeFunction(const SomeObject& someObject) {}
+ ```
+
+## Language usage
+
+- Header `#include` statements should be minimized.
+ Inclusion of unnecessary headers slows down compilation, and can hide errors where a function calls a
+ subroutine which it should not be using if the unnecessary header defining this subroutine is included.
+
+ Header statements should be included in the following order:
+
+ - Header file corresponding to the current source file (if applicable)
+ - Headers from the same component
+ - Headers from other components
+ - Third-party headers
+ - System headers
+
+ > **Note:** Leave one blank line between each of these groups for readability.
+ >Use quotes for headers from within the same project and angle brackets for third-party and system headers.
+ >Do not use paths relative to the current source file, such as `../Header.hpp`. Instead configure your include paths
+>in the project makefiles.
+
+ ```C++
+ #include "ExampleClass.hpp" // Own header
+
+ #include "Header1.hpp" // Header from same component
+ #include "Header1.hpp" // Header from same component
+
+ #include "other/Header3.hpp" // Header from other component
+
+ #include <ThirdParty.hpp> // Third-party headers
+
+ #include <vector> // System header
+
+ // [...]
+ ```
+
+- C++ casts should use the template-styled case syntax
+
+ ```C++
+ int a = 100;
+ float b = (float)a; // Not OK
+ float c = static_cast<float>(a); // OK
+ ```
+
+- Use the const keyword to declare constants instead of define.
+
+- Should use `nullptr` instead of `NULL`,
+ C++11 introduced the `nullptr` type to distinguish null pointer constants from the integer 0.
diff --git a/docs/sections/customizing.md b/docs/sections/customizing.md
new file mode 100644
index 0000000..e92c327
--- /dev/null
+++ b/docs/sections/customizing.md
@@ -0,0 +1,731 @@
+# Implementing custom ML application
+
+- [Software project description](#software-project-description)
+- [HAL API](#hal-api)
+- [Main loop function](#main-loop-function)
+- [Application context](#application-context)
+- [Profiler](#profiler)
+- [NN Model API](#nn-model-api)
+- [Adding custom ML use case](#adding-custom-ml-use-case)
+- [Implementing main loop](#implementing-main-loop)
+- [Implementing custom NN model](#implementing-custom-nn-model)
+- [Executing inference](#executing-inference)
+- [Printing to console](#printing-to-console)
+- [Reading user input from console](#reading-user-input-from-console)
+- [Output to MPS3 LCD](#output-to-mps3-lcd)
+- [Building custom use case](#building-custom-use-case)
+
+This section describes how to implement a custom Machine Learning
+application running on Fast Model FVP or on the Arm MPS3 FPGA prototyping board.
+
+Arm® Ethos™-U55 code sample software project offers a simple way to incorporate
+additional use-case code into the existing infrastructure and provides a build
+system that automatically picks up added functionality and produces corresponding
+executable for each use-case. This is achieved by following certain configuration
+and code implementation conventions.
+
+The following sign will indicate the important conventions to apply:
+
+> **Convention:** The code is developed using C++11 and C99 standards.
+This is governed by TensorFlow Lite for Microcontrollers framework.
+
+## Software project description
+
+As mentioned in the [Repository structure](../documentation.md#repository-structure) section, project sources are:
+
+```tree
+├── docs
+│ ├── ...
+│ └── Documentation.md
+├── resources
+│ └── img_class
+│ └── ...
+├── scripts
+│ └── ...
+├── source
+│ ├── application
+│ │ ├── hal
+│ │ ├── main
+│ │ └── tensorflow-lite-micro
+│ └── use_case
+│ └──img_class
+├── CMakeLists.txt
+└── Readme.md
+```
+
+Where `source` contains C/C++ sources for the platform and ML applications.
+Common code related to the Ethos-U55 code samples software
+framework resides in the *application* sub-folder and ML application specific logic (use-cases)
+sources are in the *use-case* subfolder.
+
+> **Convention**: Separate use-cases must be organized in sub-folders under the use-case folder.
+The name of the directory is used as a name for this use-case and could be provided
+as a `USE_CASE_BUILD` parameter value.
+It is expected by the build system that sources for the use-case are structured as follows:
+headers in an include directory, C/C++ sources in a src directory.
+For example:
+>
+>```tree
+>use_case
+> └──img_class
+> ├── include
+> │ └── *.hpp
+> └── src
+> └── *.cc
+>```
+
+## HAL API
+
+Hardware abstraction layer is represented by the following interfaces.
+To access them, include hal.h header.
+
+- *hal_platfrom* structure:\
+ Structure that defines a platform context to be used by the application
+
+ | Attribute name | Description |
+ |--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+ | inited | Initialization flag. Is set after the platfrom_init() function is called. |
+ | plat_name | Platform name. it is set to "mps3-bare" for MPS3 build and "FVP" for Fast Model build. |
+ | data_acq | Pointer to data acquisition module responsible for user interaction and other data collection for the application logic. |
+ | data_psn | Pointer to data presentation module responsible for data output through components available in the selected platform: LCD -- for MPS3, console -- for Fast Model. |
+ | timer | Pointer to platform timer implementation (see platform_timer) |
+ | platform_init | Pointer to platform initialization function. |
+ | platform_release | Pointer to platform release function |
+
+- *hal_init* function:\
+ Initializes the HAL structure based on compile time config. This
+ should be called before any other function in this API.
+
+ | Parameter name | Description|
+ |------------------|-----------------------------------------------------|
+ | platform | Pointer to a pre-allocated *hal_platfrom* struct. |
+ | data_acq | Pointer to a pre-allocated data acquisition module |
+ | data_psn | Pointer to a pre-allocated data presentation module |
+ | timer | Pointer to a pre-allocated timer module |
+ | return | zero if successful, error code otherwise |
+
+- *hal_platform_init* function:\
+ Initializes the HAL platform and all the modules on the platform the
+ application requires to run.
+
+ | Parameter name | Description |
+ | ----------------| ------------------------------------------------------------------- |
+ | platform | Pointer to a pre-allocated and initialized *hal_platfrom* struct. |
+ | return | zero if successful, error code otherwise. |
+
+- *hal_platform_release* function\
+ Releases the HAL platform. This should release resources acquired.
+
+ | Parameter name | Description |
+ | ----------------| ------------------------------------------------------------------- |
+ | platform | Pointer to a pre-allocated and initialized *hal_platfrom* struct. |
+
+- *data_acq_module* structure:\
+ Structure to encompass the data acquisition module and it's
+ methods.
+
+ | Attribute name | Description |
+ |----------------|----------------------------------------------------|
+ | inited | Initialization flag. Is set after the system_init () function is called. |
+ | system_name | Channel name. It is set to "UART" for MPS3 build and fastmodel builds. |
+ | system_init | Pointer to data acquisition module initialization function. The pointer is set according to the platform selected during the build. This function is called by the platforminitialization routines. |
+ | get_input | Pointer to a function reading user input. The pointer is set according to the selected platform during the build. For MPS3 and fastmodel environments, the function reads data from UART. |
+
+- *data_psn_module* structure:\
+ Structure to encompass the data presentation module and its methods.
+
+ | Attribute name | Description |
+ |--------------------|------------------------------------------------|
+ | inited | Initialization flag. It is set after the system_init () function is called. |
+ | system_name | System component name used to present data. It is set to "lcd" for MPS3 build and to "log_psn" for fastmodel build. In case of fastmodel, all pixel drawing functions are replaced by console output of the data summary. |
+ | system_init | Pointer to data presentation module initialization function. The pointer is set according to the platform selected during the build. This function is called by the platform initialization routines. |
+ | present_data_image | Pointer to a function to draw an image. The pointer is set according to the selected platform during the build. For MPS3, the image will be drawn on the LCD; for fastmodel image summary will be printed in the UART (coordinates, channel info, downsample factor) |
+ | present_data_text | Pointer to a function to print a text. The pointer is set according to the selected platform during the build. For MPS3, the text will be drawn on the LCD; for fastmodel text will be printed in the UART. |
+ | present_box | Pointer to a function to draw a rectangle. The pointer is set according to the selected platform during the build. For MPS3, the image will be drawn on the LCD; for fastmodel image summary will be printed in the UART. |
+ | clear | Pointer to a function to clear the output. The pointer is set according to the selected platform during the build. For MPS3, the function will clear the LCD; for fastmodel will do nothing. |
+ | set_text_color | Pointer to a function to set text color for the next call of present_data_text() function. The pointer is set according to the selected platform during the build. For MPS3, the function will set the color for the text printed on the LCD; for fastmodel -- will do nothing. |
+ | set_led | Pointer to a function controlling an LED (led_num) with on/off |
+
+- *platform_timer* structure:\
+ Structure to hold a platform specific timer implementation.
+
+ | Attribute name | Description |
+ |--------------------|------------------------------------------------|
+ | inited | Initialization flag. It is set after the timer is initialized by the *hal_platform_init* function. |
+ | reset | Pointer to a function to reset a timer. |
+ | get_time_counter | Pointer to a function to get current time counter. |
+ | get_duration_ms | Pointer to a function to calculate duration between two time-counters in milliseconds. |
+ | get_duration_us | Pointer to a function to calculate duration between two time-counters in microseconds |
+ | get_npu_cycle_diff | Pointer to a function to calculate duration between two time-counters in Ethos-U55 cycles. Available only when project is configured with ETHOS_U55_ENABLED set. |
+
+Example of the API initialization in the main function:
+
+```c++
+#include "hal.h"
+
+int main ()
+
+{
+
+ hal_platform platform;
+ data_acq_module dataAcq;
+ data_psn_module dataPsn;
+ platform_timer timer;
+
+ /* Initialise the HAL and platform */
+ hal_init(&platform, &dataAcq, &dataPsn, &timer);
+ hal_platform_init(&platform);
+
+ ...
+
+ hal_platform_release(&platform);
+
+ return 0;
+
+}
+```
+
+## Main loop function
+
+Code samples application main function will delegate the use-case
+logic execution to the main loop function that must be implemented for
+each custom ML scenario.
+
+Main loop function takes the initialized *hal_platform* structure
+pointer as an argument.
+
+The main loop function has external linkage and main executable for the
+use-case will have reference to the function defined in the use-case
+code.
+
+```c++
+void main_loop(hal_platform& platform){
+
+...
+
+}
+```
+
+## Application context
+
+Application context could be used as a holder for a state between main
+loop iterations. Include AppContext.hpp to use ApplicationContext class.
+
+| Method name | Description |
+|--------------|-----------------------------------------------------------------|
+| Set | Saves given value as a named attribute in the context. |
+| Get | Gets the saved attribute from the context by the given name. |
+| Has | Checks if an attribute with a given name exists in the context. |
+
+For example:
+
+```c++
+#include "hal.h"
+#include "AppContext.hpp"
+
+void main_loop(hal_platform& platform) {
+
+ /* Instantiate application context */
+ arm::app::ApplicationContext caseContext;
+ caseContext.Set<hal_platform&>("platform", platform);
+ caseContext.Set<uint32_t>("counter", 0);
+
+ /* loop */
+ while (true) {
+ // do something, pass application context down the call stack
+ }
+}
+```
+
+## Profiler
+
+Profiler is a helper class assisting in collection of timings and
+Ethos-U55 cycle counts for operations. It uses platform timer to get
+system timing information.
+
+| Method name | Description |
+|----------------------|-----------------------------------------------------------|
+| StartProfiling | Starts profiling and records the starting timing data. |
+| StopProfiling | Stops profiling and records the ending timing data. |
+| Reset | Resets the profiler and clears all collected data. |
+| GetResultsAndReset | Gets the results as string and resets the profiler. |
+
+Usage example:
+
+```c++
+Profiler profiler{&platform, "Inference"};
+
+profiler.StartProfiling();
+// Code running inference to profile
+profiler.StopProfiling();
+
+info("%s\n", profiler.GetResultsAndReset().c_str());
+```
+
+## NN Model API
+
+Model (refers to neural network model) is an abstract class wrapping the
+underlying TensorFlow Lite Micro API and providing methods to perform
+common operations such as TensorFlow Lite Micro framework
+initialization, inference execution, accessing input and output tensor
+objects.
+
+To use this abstraction, import TensorFlowLiteMicro.hpp header.
+
+| Method name | Description |
+|--------------------------|------------------------------------------------------------------------------|
+| GetInputTensor | Returns the pointer to the model\'s input tensor. |
+| GetOutputTensor | Returns the pointer to the model\'s output tensor |
+| GetType | Returns the model's data type |
+| GetInputShape | Return the pointer to the model\'s input shape |
+| GetOutputShape | Return the pointer to the model\'s output shape |
+| LogTensorInfo | Logs the tensor information to stdout for the given tensor pointer: tensor name, tensor address, tensor type, tensor memory size and quantization params. |
+| LogInterpreterInfo | Logs the interpreter information to stdout. |
+| Init | Initializes the TensorFlow Lite Micro framework, allocates require memory for the model. |
+| IsInited | Checks if this model object has been initialized. |
+| IsDataSigned | Checks if the model uses signed data type. |
+| RunInference | Runs the inference (invokes the interpreter). |
+| GetOpResolver() | Returns the reference to the TensorFlow Lite Micro operator resolver. |
+| EnlistOperations | Registers required operators with TensorFlow Lite Micro operator resolver. |
+| GetTensorArena | Returns pointer to memory region to be used for tensors allocations. |
+| GetActivationBufferSize | Returns the size of the tensor arena memory region. |
+
+> **Convention**: Each ML use-case must have extension of this class and implementation of the protected virtual methods:
+>
+>```c++
+>virtual const tflite::MicroOpResolver& GetOpResolver() = 0;
+>virtual bool EnlistOperations() = 0;
+>virtual uint8_t* GetTensorArena() = 0;
+>virtual size_t GetActivationBufferSize() = 0;
+>```
+>
+>Network models have different set of operators that must be registered with
+tflite::MicroMutableOpResolver object in the EnlistOperations method.
+Network models could require different size of activation buffer that is returned as
+tensor arena memory for TensorFlow Lite Micro framework by the GetTensorArena
+and GetActivationBufferSize methods.
+
+Please see MobileNetModel.hpp and MobileNetModel.cc files from image
+classification ML application use-case as an example of the model base
+class extension.
+
+## Adding custom ML use case
+
+This section describes how to implement additional use-case and compile
+it into the binary executable to run with Fast Model or MPS3 FPGA board.
+It covers common major steps: application main loop creation,
+description of the NN model, inference execution.
+
+In addition, few useful examples are provided: reading user input,
+printing into console, drawing images into MPS3 LCD.
+
+```tree
+use_case
+ └──hello_world
+ ├── include
+ └── src
+```
+
+Start with creation of a sub-directory under the *use_case* directory and
+two other directories *src* and *include* as described in
+[Software project description](#software-project-description) section:
+
+## Implementing main loop
+
+Use-case main loop is the place to put use-case main logic. Essentially,
+it is an infinite loop that reacts on user input, triggers use-case
+conditional logic based on the input and present results back to the
+user. However, it could also be a simple logic that runs a single inference
+and then exits.
+
+Main loop has knowledge about the platform and has access to the
+platform components through the hardware abstraction layer (referred to as HAL).
+
+Create a *MainLoop.cc* file in the *src* directory (the one created under
+[Adding custom ML use case](#adding-custom-ml-use-case)), the name is not
+important. Define *main_loop* function with the signature described in
+[Main loop function](#main-loop-function):
+
+```c++
+#include "hal.h"
+
+void main_loop(hal_platform& platform) {
+ printf("Hello world!");
+}
+```
+
+The above is already a working use-case, if you compile and run it (see
+[Building custom usecase](#Building-custom-use-case)) the application will start, print
+message to console and exit straight away.
+
+Now, you can start filling this function with logic.
+
+## Implementing custom NN model
+
+Before inference could be run with a custom NN model, TensorFlow Lite
+Micro framework must learn about the operators/layers included in the
+model. Developer must register operators using *MicroMutableOpResolver*
+API.
+
+Ethos-U55 code samples project has an abstraction around TensorFlow
+Lite Micro API (see [NN model API](#nn-model-api)). Create *HelloWorld.hpp* in
+the use-case include sub-directory, extend Model abstract class and
+declare required methods.
+
+For example:
+
+```c++
+#include "Model.hpp"
+
+namespace arm {
+namespace app {
+
+class HelloWorldModel: public Model {
+ protected:
+ /** @brief Gets the reference to op resolver interface class. */
+ const tflite::MicroOpResolver& GetOpResolver() override;
+
+ /** @brief Adds operations to the op resolver instance. */
+ bool EnlistOperations() override;
+
+ const uint8_t* ModelPointer() override;
+
+ size_t ModelSize() override;
+
+ private:
+ /* Maximum number of individual operations that can be enlisted. */
+ static constexpr int _m_maxOpCnt = 5;
+
+ /* A mutable op resolver instance. */
+ tflite::MicroMutableOpResolver<_maxOpCnt> _m_opResolver;
+ };
+} /* namespace app */
+} /* namespace arm */
+```
+
+Create `HelloWorld.cc` file in the `src` sub-directory and define the methods
+there. Include `HelloWorldModel.hpp` created earlier. Note that `Model.hpp`
+included in the header provides access to TensorFlow Lite Micro's operation
+resolver API.
+
+Please, see `use_case/image_classifiaction/src/MobileNetModel.cc` for
+code examples.\
+If you are using a TensorFlow Lite model compiled with Vela, it is important to add
+custom Ethos-U55 operator to the operators list.
+
+The following example shows how to add the custom Ethos-U55 operator with
+TensorFlow Lite Micro framework. We will use the ARM_NPU define to exclude
+the code if the application was built without NPU support.
+
+```c++
+#include "HelloWorldModel.hpp"
+
+bool arm::app::HelloWorldModel::EnlistOperations() {
+
+ #if defined(ARM_NPU)
+ if (kTfLiteOk == this->_opResolver.AddEthosU()) {
+ info("Added %s support to op resolver\n",
+ tflite::GetString_ETHOSU());
+ } else {
+ printf_err("Failed to add Arm NPU support to op resolver.");
+ return false;
+ }
+ #endif /* ARM_NPU */
+
+ return true;
+}
+```
+
+To minimize application memory footprint, it is advised to register only
+operators used by the NN model.
+
+Define `ModelPointer` and `ModelSize` methods. These functions are wrappers around the
+functions generated in the C++ file containing the neural network model as an array.
+This generation the C++ array from the .tflite file, logic needs to be defined in
+the `usecase.cmake` file for this `HelloWorld` example.
+
+For more details on `usecase.cmake`, see [Building custom use case](#building-custom-use-case).
+For details on code generation flow in general, see [Automatic file generation](./building.md#Automatic-file-generation)
+
+The TensorFlow Lite model data is read during Model::init() method execution, see
+*application/tensorflow-lite-micro/Model.cc* for more details. Model invokes
+`ModelPointer()` function which calls the `GetModelPointer()` function to get
+neural network model data memory address. The `GetModelPointer()` function
+will be generated during the build and could be found in the
+file `build/generated/hello_world/src/<model_file_name>.cc`. Generated
+file is added to the compilation automatically.
+
+Use \${use-case}_MODEL_TFLITE_PATH build parameter to include custom
+model to the generation/compilation process (see [Build options](./building.md/#build-options)).
+
+## Executing inference
+
+To run an inference successfully it is required to have:
+
+- a TensorFlow Lite model file
+- extended Model class
+- place to add the code to invoke inference
+- main loop function
+- and some input data.
+
+For the hello_world example below, the input array is not populated.
+However, for real-world scenarios, this data should either be read from
+an on-board device or be prepared in the form of C++ sources before
+compilation and be baked into the application.
+
+For example, the image classification application has extra build steps
+to generate C++ sources from the provided images with
+*generate_images_code* CMake function.
+
+> **Note:**
+Check the input data type for your NN model and input array data type are the same.
+For example, generated C++ sources for images store image data as uint8 array. For models that were
+quantized to int8 data type, it is important to convert image data to int8 correctly before inference execution.
+Asymmetric data type to symmetric data type conversion involves positioning zero value, i.e. subtracting an
+offset for uint8 values. Please check image classification application source for the code example
+(ConvertImgToInt8 function).
+
+The following code adds inference invocation to the main loop function:
+
+```c++
+#include "hal.h"
+#include "HelloWorldModel.hpp"
+
+ void main_loop(hal_platform& platform) {
+
+ /* model wrapper object */
+ arm::app::HelloWorldModel model;
+
+ /* Load the model */
+ if (!model.Init()) {
+ printf_err("failed to initialise model\n");
+ return;
+ }
+
+ TfLiteTensor *outputTensor = model.GetOutputTensor();
+ TfLiteTensor *inputTensor = model.GetInputTensor();
+
+ /* dummy input data*/
+ uint8_t inputData[1000];
+
+ memcpy(inputTensor->data.data, inputData, 1000);
+
+ /* run inference */
+ model.RunInference();
+
+ const uint32_t tensorSz = outputTensor->bytes;
+ const uint8_t * outputData = tflite::GetTensorData<uint8>(outputTensor);
+}
+```
+
+The code snippet has several important blocks:
+
+- Creating HelloWorldModel object and initializing it.
+
+ ```c++
+ arm::app::HelloWorldModel model;
+
+ /* Load the model */
+ if (!model.Init()) {
+ printf_err(\"failed to initialise model\\n\");
+ return;
+ }
+ ```
+
+- Getting pointers to allocated input and output tensors.
+
+ ```c++
+ TfLiteTensor *outputTensor = model.GetOutputTensor();
+ TfLiteTensor *inputTensor = model.GetInputTensor();
+ ```
+
+- Copying input data to the input tensor. We assume input tensor size
+ to be 1000 uint8 elements.
+
+ ```c++
+ memcpy(inputTensor->data.data, inputData, 1000);
+ ```
+
+- Running inference
+
+ ```c++
+ model.RunInference();
+ ```
+
+- Reading inference results: data and data size from the output
+ tensor. We assume that output layer has uint8 data type.
+
+ ```c++
+ Const uint32_t tensorSz = outputTensor->bytes ;
+
+ const uint8_t *outputData = tflite::GetTensorData<uint8>(outputTensor);
+ ```
+
+Adding profiling for Ethos-U55 is easy. Include `Profiler.hpp` header and
+invoke `StartProfiling` and `StopProfiling` around inference
+execution.
+
+```c++
+Profiler profiler{&platform, "Inference"};
+
+profiler.StartProfiling();
+model.RunInference();
+profiler.StopProfiling();
+std::string profileResults = profiler.GetResultsAndReset();
+
+info("%s\n", profileResults.c_str());
+```
+
+## Printing to console
+
+Provided examples already used some function to print messages to the
+console. The full list of available functions:
+
+- `printf`
+- `trace` - printf wrapper for tracing messages
+- `debug` - printf wrapper for debug messages
+- `info` - printf wrapper for informational messages
+- `warn` - printf wrapper for warning messages
+- `printf_err` - printf wrapper for error messages
+
+`printf` wrappers could be switched off with `LOG_LEVEL` define:
+
+trace (0) < debug (1) < info (2) < warn (3) < error (4).
+
+Default output level is info = level 2.
+
+## Reading user input from console
+
+Platform data acquisition module has get_input function to read keyboard
+input from the UART. It can be used as follows:
+
+```c++
+char ch_input[128];
+platform.data_acq->get_input(ch_input, sizeof(ch_input));
+```
+
+The function will block until user provides an input.
+
+## Output to MPS3 LCD
+
+Platform presentation module has functions to print text or an image to
+the board LCD:
+
+- `present_data_text`
+- `present_data_image`
+
+Text presentation function has the following signature:
+
+- `const char* str`: string to print.
+- `const uint32_t str_sz`: string size.
+- `const uint32_t pos_x`: x coordinate of the first letter in pixels.
+- `const uint32_t pos_y`: y coordinate of the first letter in pixels.
+- `const uint32_t alow_multiple_lines`: signals whether the text is
+ allowed to span multiple lines on the screen, or should be truncated
+ to the current line.
+
+This function does not wrap text, if the given string cannot fit on the
+screen it will go outside the screen boundary.
+
+Example that prints "Hello world" on the LCD:
+
+```c++
+std::string hello("Hello world");
+platform.data_psn->present_data_text(hello.c_str(), hello.size(), 10, 35, 0);
+```
+
+Image presentation function has the following signature:
+
+- `uint8_t* data`: image data pointer;
+- `const uint32_t width`: image width;
+- `const uint32_t height`: image height;
+- `const uint32_t channels`: number of channels. Only 1 and 3 channels are supported now.
+- `const uint32_t pos_x`: x coordinate of the first pixel.
+- `const uint32_t pos_y`: y coordinate of the first pixel.
+- `const uint32_t downsample_factor`: the factor by which the image is to be down sampled.
+
+For example, the following code snippet visualizes an input tensor data
+for MobileNet v2 224 (down sampling it twice):
+
+```c++
+platform.data_psn->present_data_image((uint8_t *) inputTensor->data.data, 224, 224, 3, 10, 35, 2);
+```
+
+Please see [hal-api](#hal-api) section for other data presentation
+functions.
+
+## Building custom use case
+
+There is one last thing to do before building and running a use-case
+application: create a `usecase.cmake` file in the root of your use-case,
+the name of the file is not important.
+
+> **Convention:** The build system searches for CMake file in each use-case directory and includes it into the build
+> flow. This file could be used to specify additional application specific build options, add custom build steps or
+> override standard compilation and linking flags.
+> Use `USER_OPTION` function to add additional build option. Prefix variable name with `${use_case}` (use-case name) to
+> avoid names collisions with other CMake variables.
+> Some useful variable names visible in use-case CMake file:
+>
+> - `DEFAULT_MODEL_PATH` – default model path to use if use-case specific `${use_case}_MODEL_TFLITE_PATH` is not set
+>in the build arguments.
+>- `TARGET_NAME` – name of the executable.
+> - `use_case` – name of the current use-case.
+> - `UC_SRC` – list of use-case sources.
+> - `UC_INCLUDE` – path to the use-case headers.
+> - `ETHOS_U55_ENABLED` – flag indicating if the current build supports Ethos-U55.
+> - `TARGET_PLATFORM` – Target platform being built for.
+> - `TARGET_SUBSYSTEM` – If target platform supports multiple subsystems, this is the name of the subsystem.
+> - All standard build options.
+> - `CMAKE_CXX_FLAGS` and `CMAKE_C_FLAGS` – compilation flags.
+> - `CMAKE_EXE_LINKER_FLAGS` – linker flags.
+
+For the hello world use-case it will be enough to create
+`helloworld.cmake` file and set DEFAULT_MODEL_PATH:
+
+```cmake
+if (ETHOS_U55_ENABLED EQUAL 1)
+ set(DEFAULT_MODEL_PATH ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8_vela.tflite)
+else()
+ set(DEFAULT_MODEL_PATH ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8.tflite)
+endif()
+```
+
+This can be used in subsequent section, for example:
+
+```cmake
+USER_OPTION(${use_case}_MODEL_TFLITE_PATH "Neural network model in tflite format."
+ ${DEFAULT_MODEL_PATH}
+ FILEPATH
+ )
+
+# Generate model file
+generate_tflite_code(
+ MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
+ DESTINATION ${SRC_GEN_DIR}
+ )
+```
+
+This ensures that the model path pointed by `${use_case}_MODEL_TFLITE_PATH` is converted to a C++ array and is picked
+up by the build system. More information on auto-generations is available under section
+[Automatic file generation](./building.md#Automatic-file-generation).
+
+To build you application follow the general instructions from
+[Add Custom inputs](#add-custom-inputs) and specify the name of the use-case in the
+build command:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DUSE_CASE_BUILD=hello_world \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+As a result, `ethos-u-hello_world.axf` should be created, MPS3 build
+will also produce `sectors/hello_world` directory with binaries and
+`images-hello_world.txt` to be copied to the board MicroSD card.
+
+Next section of the documentation: [Testing and benchmarking](../documentation.md#Testing-and-benchmarking).
diff --git a/docs/sections/deployment.md b/docs/sections/deployment.md
new file mode 100644
index 0000000..354d30b
--- /dev/null
+++ b/docs/sections/deployment.md
@@ -0,0 +1,281 @@
+# Deployment
+
+- [Fixed Virtual Platform](#fixed-virtual-platform)
+ - [Setting up the MPS3 Arm Corstone-300 FVP](#setting-up-the-mps3-arm-corstone-300-fvp)
+ - [Deploying on an FVP emulating MPS3](#deploying-on-an-fvp-emulating-mps3)
+- [MPS3 board](#mps3-board)
+ - [Deployment on MPS3 board](#deployment-on-mps3-board)
+
+The sample application for Arm® Ethos™-U55 can be deployed on two
+target platforms, both of which implement the Arm® Corstone™-300 design (see
+<https://www.arm.com/products/iot/soc/corstone-300>):
+
+- A physical Arm MPS3 FPGA prototyping board
+
+- An MPS3 FVP
+
+## Fixed Virtual Platform
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads
+](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+Download the correct archive from the list under `Arm Corstone-300`. We need the one which:
+
+- Emulates MPS3 board (not for MPS2 FPGA board)
+- Contains support for Arm® Ethos™-U55
+
+> **Note:** Currently, the FVP only has a Linux OS version. Also, there are no FVPs available for `SSE-200`
+> which satisfy the above conditions.
+
+For FVP, the elf or the axf file can be run using the Fast Model
+executable as outlined under the [Starting Fast Model simulation](./setup.md/#starting-fast-model-simulation)
+except for the binary being pointed at here
+is the one just built using the steps in the previous section.
+
+### Setting up the MPS3 Arm Corstone-300 FVP
+
+For Ethos-U55 sample application, please download the MPS3 version of the
+Arm® Corstone™-300 model that contains Ethos-U55 and Arm® Cortex®-M55. The model is
+currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+ `./FVP_Corstone_SSE-300_Ethos-U55.sh`
+
+- Follow the instructions to install the FVP to your desired location
+
+### Deploying on an FVP emulating MPS3
+
+This section assumes that the FVP has been installed (see [Setting up the MPS3 Arm Corstone-300 FVP](#Setting-up-the-MPS3-Arm-Corstone-300-FVP)) to the user's home directory `~/FVP_Corstone_SSE-300_Ethos-U55`.
+
+The installation, typically, will have the executable under `~/FVP_Corstone_SSE-300_Ethos-U55/model/<OS>_<compiler-version>/`
+directory. For the example below, we assume it to be `~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4`.
+
+To run a use case on the FVP, from the [Build directory](../sections/building.md#Create-a-build-directory):
+
+```commandline
+~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-<use_case>.axf
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+
+ Ethos-U rev 0 --- Oct 13 2020 11:27:45
+ (C) COPYRIGHT 2019-2020 Arm Limited
+ ALL RIGHTS RESERVED
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started it outputs a menu and waits for the user input from telnet terminal.
+
+For example, the image classification use case can be started by:
+
+```commandline
+~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-img_class.axf
+```
+
+The FVP supports many command line parameters:
+
+- passed by using `-C <param>=<value>`. The most important ones are:
+ - `ethosu.num_macs`: Sets the Ethos-U55 configuration for the model. Valid parameters are `32`, `64`, `256`,
+ and the default one `128`. The number signifies the 8x8 MACs performed per cycle count available on the hardware.
+ - `cpu0.CFGITCMSZ`: ITCM size for the Cortex-M CPU. Size of ITCM is pow(2, CFGITCMSZ - 1) KB
+ - `cpu0.CFGDTCMSZ`: DTCM size for the Cortex-M CPU. Size of DTCM is pow(2, CFGDTCMSZ - 1) KB
+ - `mps3_board.telnetterminal0.start_telnet` : Starts the telnet session if nothing connected.
+ - `mps3_board.uart0.out_file`: Sets the output file to hold data written by the UART
+ (use '-' to send all output to stdout, empty by default).
+ - `mps3_board.uart0.shutdown_on_eot`: Sets to shutdown simulation when a EOT (ASCII 4) char is transmitted.
+ - `mps3_board.visualisation.disable-visualisation`: Enables or disables visualisation (disabled by default).
+
+ To start the model in `128` mode for Ethos-U55:
+
+ ```commandline
+ ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-img_class.axf -C ethosu.num_macs=128
+ ```
+
+- `-l`: shows the full list of supported parameters
+
+ ```commandline
+ ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -l
+ ```
+
+- `--stat`: prints some run statistics on simulation exit
+
+ ```commandline
+ ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 --stat
+ ```
+
+- `--timelimit`: sets the number of wall clock seconds for the simulator to run, excluding startup and shutdown.
+
+## MPS3 board
+
+> **Note:** Before proceeding, make sure you have the MPS3 board powered on,
+and USB A to B connected between your machine and the MPS3.
+The connector on the MPS3 is marked as "Debug USB".
+
+![MPS3](../media/mps3.png)
+
+1. MPS3 board top view.
+
+Once the board has booted, the micro SD card will enumerate as a mass
+storage device. On most systems this will be automatically mounted, but
+you might need to mount it manually.
+
+Also, there should be four serial-over-USB ports available for use via
+this connection. On Linux based machines, these would typically be
+*/dev/ttyUSB\<n\>* to */dev/ttyUSB\<n+3\>*.
+
+The default configuration for all of them is 115200, 8/N/1 (15200 bauds,
+8 bits, no parity and 1 stop bit) with no flow control.
+
+> **Note:** For Windows machines, additional FTDI drivers might need to be installed
+for these serial ports to be available.
+For more information on getting started with an MPS3 board, please refer to
+<https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/MPS3GettingStarted.pdf>
+
+### Deployment on MPS3 board
+
+> **NOTE**: These instructions are valid only if the evaluation is being
+ done using the MPS3 FPGA platform using either `SSE-200` or `SSE-300`.
+
+To run the application on MPS3 platform, firstly it's necessary to make sure
+that the platform has been set up using the correct configuration.
+For details, on platform set up, please see the relevant documentation. For `Arm Corstone-300`, this is available
+[here](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/DAI0547B_SSE300_PLUS_U55_FPGA_for_mps3.pdf?revision=d088d931-03c7-40e4-9045-31ed8c54a26f&la=en&hash=F0C7837C8ACEBC3A0CF02D871B3A6FF93E09C6B8).
+
+For MPS3 board, instead of loading the axf file directly, the executable blobs
+generated under the *sectors/<use_case>* subdirectory need to be
+copied over to the MP3 board's micro SD card. Also, every use case build
+generates a corresponding images.txt file which is used by the MPS3 to
+understand which memory regions the blobs are to be loaded into.
+
+Once the USB A <--> B cable between the MPS3 and the development machine
+is connected and the MPS3 board powered on, the board should enumerate
+as a mass storage device over this USB connection.
+There might be two devices also, depending on the version of the board
+you are using. The device named `V2M-MPS3` or `V2MMPS3` is the `SD card`.
+
+If the axf/elf file is within 1MiB, it can be flashed into the FPGA
+memory directly without having to break it down into separate load
+region specific blobs. However, with neural network models exceeding
+this size, it becomes necessary to follow this approach.
+
+1. For example, the image classification use case will produce:
+
+ ```tree
+ ./bin/sectors/
+ └── img_class
+ ├── dram.bin
+ └── itcm.bin
+ ```
+
+ For example, if the micro SD card is mounted at
+ /media/user/V2M-MPS3/:
+
+ ```commandline
+ cp -av ./bin/sectors/img_class/* /media/user/V2M-MPS3/SOFTWARE/
+ ```
+
+2. The generated `\<use-case\>_images.txt` file needs to be copied
+over to the MPS3. The exact location for the destination will depend
+on the MPS3 board's version and the application note for the bit
+file in use.
+For example, for MPS3 board hardware revision C, using an
+application note directory named "ETHOSU", to replace the images.txt
+file:
+
+ ```commandline
+ cp ./bin/images-img_class.txt /media/user/V2M-MPS3/MB/HBI0309C/ETHOSU/images.txt
+ ```
+
+3. Open the first serial port available from MPS3, for example,
+"/dev/ttyUSB0". This can be typically done using minicom, screen or
+Putty application. Make sure the flow control setting is switched
+off.
+
+ ```commandline
+ minicom --D /dev/ttyUSB0
+ ```
+
+ ```log
+ Welcome to minicom 2.7.1
+ OPTIONS: I18n
+ Compiled on Aug 13 2017, 15:25:34.
+ Port /dev/ttyUSB0, 16:05:34
+ Press CTRL-A Z for help on special keys
+ Cmd>
+ ```
+
+4. In another terminal, open the second serial port, for example,
+ "/dev/ttyUSB1":
+
+ ```commandline
+ minicom --D /dev/ttyUSB1
+ ```
+
+5. On the first serial port, issue a "reboot" command and press the
+ return key
+
+ ```commandline
+ $ Cmd> reboot
+ ```
+
+ ```log
+ Rebooting...Disabling debug USB..Board rebooting...
+
+ ARM V2M-MPS3 Firmware v1.3.2
+ Build Date: Apr 20 2018
+
+ Powering up system...
+ Switching on main power...
+ Configuring motherboard (rev C, var A)...
+ ```
+
+ This will go on to reboot the board and prime the application to run by
+ flashing the binaries into their respective FPGA memory locations. For example:
+
+ ```log
+ Reading images file \MB\HBI0309C\ETHOSU\images.txt
+ Writing File \SOFTWARE\itcm.bin to Address 0x00000000
+
+ ............
+
+ File \SOFTWARE\itcm.bin written to memory address 0x00000000
+ Image loaded from \SOFTWARE\itcm.bin
+ Writing File \SOFTWARE\dram.bin to Address 0x08000000
+
+ ..........................................................................
+
+
+ File \SOFTWARE\dram.bin written to memory address 0x08000000
+ Image loaded from \SOFTWARE\dram.bin
+ ```
+
+6. When the reboot from previous step is completed, issue a reset
+ command on the command prompt.
+
+ ``` commandline
+ $ Cmd> reset
+ ```
+
+ This will trigger the application to start, and the output should be visible on the second serial connection.
+
+7. On the second serial port, output similar to section 2.2 should be visible:
+
+ ```log
+ [INFO] Setting up system tick IRQ (for NPU)
+ [INFO] V2M-MPS3 revision C
+ [INFO] Application Note AN540, Revision B
+ [INFO] FPGA build 1
+ [INFO] Core clock has been set to: 32000000 Hz
+ [INFO] CPU ID: 0x410fd220
+ [INFO] CPU: Cortex-M55 r0p0
+ ...
+ ```
+
+
+Next section of the main documentation, [Running code samples applications](../documentation.md#Running-code-samples-applications).
diff --git a/docs/sections/run.md b/docs/sections/run.md
new file mode 100644
index 0000000..90ee7c8
--- /dev/null
+++ b/docs/sections/run.md
@@ -0,0 +1,42 @@
+
+# Running Ethos-U55 Code Samples
+
+- [Starting Fast Model simulation](#starting-fast-model-simulation)
+
+This section covers the process for getting started with pre-built binaries for the Code Samples.
+
+## Starting Fast Model simulation
+
+Once built application binaries and assuming the install location of the FVP
+was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-<use_case>.axf
+```
+
+This will start the Fast Model simulation for the chosen use-case.
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's
+standard output and error log entries containing information about the
+pre-built application version, TensorFlow Lite Micro library version
+used, data type as well as the input and output tensor sizes of the
+model compiled into the executable binary.
+
+![FVP](../media/fvp.png)
+
+![FVP Terminal](../media/fvpterminal.png)
+
+> **Note:**
+For details on the specific use-case follow the instructions in the corresponding documentation.
+
+Next section of the documentation: [Implementing custom ML application](../documentation.md#Implementing-custom-ML-application).
diff --git a/docs/sections/testing_benchmarking.md b/docs/sections/testing_benchmarking.md
new file mode 100644
index 0000000..43bb7f4
--- /dev/null
+++ b/docs/sections/testing_benchmarking.md
@@ -0,0 +1,87 @@
+# Testing and benchmarking
+
+- [Testing](#testing)
+- [Benchmarking](#benchmarking)
+
+## Testing
+
+The `tests` folder has the following structure:
+
+```tree
+.
+├── common
+│ └── ...
+├── use_case
+│ ├── <usecase1>
+│ │ └── ...
+│ ├── <usecase2>
+│ │ └── ...
+└── utils
+ └── ...
+```
+
+Where:
+
+- `common`: contains tests for generic and common appplication functions.
+- `use_case`: contains all the use case specific tests in the respective folders.
+- `utils`: contains utilities sources used only within the tests.
+
+When [configuring](./building.md#configuring-the-build-native-unit-test) and
+[building](./building.md#Building-the-configured-project) for `native` target platform results of the build will
+be placed under `build/bin/` folder, for example:
+
+```tree
+.
+├── dev_ethosu_eval-<usecase1>-tests
+├── dev_ethosu_eval-<usecase2>-tests
+├── ethos-u-<usecase1>
+└── ethos-u-<usecase1>
+```
+
+To execute unit-tests for a specific use-case in addition to the common tests:
+
+```commandline
+dev_ethosu_eval-<use_case>-tests
+```
+
+```log
+[INFO] native platform initialised
+[INFO] ARM Ethos-U55 Evaluation application for MPS3 FPGA Prototyping Board and FastModel
+
+...
+===============================================================================
+ All tests passed (37 assertions in 7 test cases)
+```
+
+Tests output could have `[ERROR]` messages, that's alright - they are coming from negative scenarios tests.
+
+## Benchmarking
+
+Profiling is enabled by default when configuring the project. This will enable displaying:
+
+- the active and idle NPU cycle counts when Arm® Ethos™-U55 is enabled (see `-DETHOS_U55_ENABLED` in
+ [Build options](./building.md#build-options).
+- CPU cycle counts and/or in milliseconds elapsed for inferences performed if CPU profiling is enabled
+ (see `-DCPU_PROFILE_ENABLED` in [Build options](./building.md#build-options). This should be done only
+ when running on a physical FPGA board as the FVP does not contain a cycle-approximate or cycle-accurate Cortex-M model.
+
+For example:
+
+- On the FVP:
+
+```log
+ Active NPU cycles: 5475412
+ Idle NPU cycles: 702
+```
+
+- For MPS3 platform, the time duration in milliseconds is also reported when `-DCPU_PROFILE_ENABLED=1` is added to
+ CMake configuration command:
+
+```log
+ Active NPU cycles: 5629033
+ Idle NPU cycles: 1005276
+ Active CPU cycles: 993553 (approx)
+ Time in ms: 210
+```
+
+Next section of the main documentation: [Troubleshooting](../documentation.md#Troubleshooting).
diff --git a/docs/sections/troubleshooting.md b/docs/sections/troubleshooting.md
new file mode 100644
index 0000000..40b975a
--- /dev/null
+++ b/docs/sections/troubleshooting.md
@@ -0,0 +1,27 @@
+# Troubleshooting
+
+- [Inference results are incorrect for my custom files](#inference-results-are-incorrect-for-my-custom-files)
+- [The application does not work with my custom model](#the-application-does-not-work-with-my-custom-model)
+
+## Inference results are incorrect for my custom files
+
+Ensure that the files you are using match the requirements of the model
+you are using and that cmake parameters are set accordingly. More
+information on these cmake parameters is detailed in their separate
+sections. Note that preprocessing of the files could also affect the
+inference result, such as the rescaling and padding operations done for
+image classification.
+
+## The application does not work with my custom model
+
+Ensure that your model is in a fully quantized `.tflite` file format,
+either uint8 or int8, and has successfully been run through the Vela
+compiler.
+
+Check that cmake parameters match your new models input requirements.
+
+> **Note:** Vela tool is not available within this software project.
+It is a python tool available from <https://pypi.org/project/ethos-u-vela/>.
+The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>.
+
+Next section of the documentation: [Contribution guidelines](../documentation.md#Contribution-guidelines).