summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authoralexander <alexander.efremov@arm.com>2021-03-26 21:42:19 +0000
committerKshitij Sisodia <kshitij.sisodia@arm.com>2021-03-29 16:29:55 +0100
commit3c79893217bc632c9b0efa815091bef3c779490c (patch)
treead06b444557eb8124652b45621d736fa1b92f65d /docs
parent6ad6d55715928de72979b04194da1bdf04a4c51b (diff)
downloadml-embedded-evaluation-kit-3c79893217bc632c9b0efa815091bef3c779490c.tar.gz
Opensource ML embedded evaluation kit21.03
Change-Id: I12e807f19f5cacad7cef82572b6dd48252fd61fd
Diffstat (limited to 'docs')
-rw-r--r--docs/documentation.md390
-rw-r--r--docs/media/APIs_description.pngbin0 -> 10802 bytes
-rw-r--r--docs/media/ASR_preprocessing.pngbin0 -> 57362 bytes
-rw-r--r--docs/media/F1.pngbin0 -> 7340 bytes
-rw-r--r--docs/media/F2.pngbin0 -> 8763 bytes
-rw-r--r--docs/media/F3.pngbin0 -> 7194 bytes
-rw-r--r--docs/media/F4.pngbin0 -> 10987 bytes
-rw-r--r--docs/media/KWS_preprocessing.pngbin0 -> 59428 bytes
-rw-r--r--docs/media/fvp.pngbin0 -> 18892 bytes
-rw-r--r--docs/media/fvpterminal.pngbin0 -> 50997 bytes
-rw-r--r--docs/media/mps3.pngbin0 -> 426865 bytes
-rw-r--r--docs/media/vela_flow.jpgbin0 -> 15226 bytes
-rw-r--r--docs/quick_start.md95
-rw-r--r--docs/sections/appendix.md20
-rw-r--r--docs/sections/building.md1023
-rw-r--r--docs/sections/coding_guidelines.md323
-rw-r--r--docs/sections/customizing.md731
-rw-r--r--docs/sections/deployment.md281
-rw-r--r--docs/sections/run.md42
-rw-r--r--docs/sections/testing_benchmarking.md87
-rw-r--r--docs/sections/troubleshooting.md27
-rw-r--r--docs/use_cases/ad.md523
-rw-r--r--docs/use_cases/asr.md529
-rw-r--r--docs/use_cases/img_class.md446
-rw-r--r--docs/use_cases/inference_runner.md296
-rw-r--r--docs/use_cases/kws.md474
-rw-r--r--docs/use_cases/kws_asr.md589
27 files changed, 5876 insertions, 0 deletions
diff --git a/docs/documentation.md b/docs/documentation.md
new file mode 100644
index 0000000..655ef27
--- /dev/null
+++ b/docs/documentation.md
@@ -0,0 +1,390 @@
+# Arm Ethos-U55 NPU Code Samples
+
+## Table of Content
+
+- [Arm Ethos-U55 NPU Code Samples](./documentation.md#arm-ethos-u55-npu-code-samples)
+ - [Table of Content](./documentation.md#table-of-content)
+ - [Trademarks](./documentation.md#trademarks)
+ - [Prerequisites](./documentation.md#prerequisites)
+ - [Additional reading](./documentation.md#additional-reading)
+ - [Repository structure](./documentation.md#repository-structure)
+ - [Models and resources](./documentation.md#models-and-resources)
+ - [Building](./documentation.md#building)
+ - [Deployment](./documentation.md#deployment)
+ - [Running code samples applications](./documentation.md#running-code-samples-applications)
+ - [Implementing custom ML application](./documentation.md#implementing-custom-ml-application)
+ - [Testing and benchmarking](./documentation.md#testing-and-benchmarking)
+ - [Troubleshooting](./documentation.md#troubleshooting)
+ - [Coding standards and guidelines](./documentation.md#coding-standards-and-guidelines)
+ - [Code Reviews](./documentation.md#code-reviews)
+ - [Testing](./documentation.md#testing)
+ - [Appendix](./documentation.md#appendix)
+
+## Trademarks
+
+- Arm® and Cortex® are registered trademarks of Arm® Limited (or its subsidiaries) in the US and/or elsewhere.
+- Arm® and Ethos™ are registered trademarks or trademarks of Arm® Limited (or its subsidiaries) in the US and/or elsewhere.
+- Arm® and Corstone™ are registered trademarks or trademarks of Arm® Limited (or its subsidiaries) in the US and/or elsewhere.
+- TensorFlow™, the TensorFlow logo and any related marks are trademarks of Google Inc.
+
+## Prerequisites
+
+Before starting the setup process, please make sure that you have:
+
+- Linux x86_64 based machine or Windows Subsystem for Linux is
+ preferable. Windows can be used as a build environment but cannot
+ run Fast Model simulations.
+
+- Arm Compiler license (version 6.14 or above).
+
+ - [Arm Compiler Download
+ Page](https://developer.arm.com/tools-and-software/embedded/arm-compiler/downloads/)
+
+- An Arm® MPS3 FPGA prototyping board and components for FPGA evaluation or a `Fixed Virtual Platform` binary:
+ - An MPS3 board loaded with Arm® Corstone™-300 reference package (`AN547`) from:
+ <https://developer.arm.com/tools-and-software/development-boards/fpga-prototyping-boards/download-fpga-images>.
+ You would also need to have a USB connection between your machine and the MPS3 board - for UART menu and for
+ deploying the application.
+ - `Arm Corstone-300` based FVP for MPS3 is available from: <https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps>.
+
+### Additional reading
+
+This document contains information that is specific to Arm® Ethos™-U55 products.
+See the following documents for other relevant information:
+
+- ML platform overview: <https://mlplatform.org/>
+
+- Arm® ML processors technical overview: <https://developer.arm.com/ip-products/processors/machine-learning>
+
+- Arm® Cortex-M55® processor: <https://www.arm.com/products/silicon-ip-cpu/cortex-m/cortex-m55>
+
+- ML processor, also referred to as a Neural Processing Unit (NPU) - Arm® Ethos™-U55:
+ <https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u55>
+
+- Arm® MPS3 FPGA Prototyping Board:
+ <https://developer.arm.com/tools-and-software/development-boards/fpga-prototyping-boards/mps3>
+
+- Arm® ML-Zoo: <https://github.com/ARM-software/ML-zoo/>
+
+See <http://developer.arm.com> for access to Arm documentation.
+
+
+## Repository structure
+
+The repository has the following structure:
+
+```tree
+.
+├── dependencies
+├── docs
+├── scripts
+│ └── ...
+├── source
+│ ├── application
+│ │ ├── hal
+│ │ ├── main
+│ │ └── tensorflow-lite-micro
+│ └── use_case
+│ └── <usecase_name>
+│   ├── include
+│   ├── src
+│   └── usecase.cmake
+├── tests
+│ └── ...
+└── CMakeLists.txt
+```
+
+Where:
+
+- `dependencies`: contains all the third party dependencies for this project.
+
+- `docs`: contains the documentation for this ML applications.
+
+- `scripts`: contains build related and source generation scripts.
+
+- `source`: contains C/C++ sources for the platform and ML applications.
+ Common code related to the Ethos-U55 NPU software
+ framework resides in *application* sub-folder with the following
+ structure:
+
+ - `application`: contains all the sources that form the *core* of the application.
+ The `use case` part of the sources depend on sources here.
+
+ - `hal`: contains hardware abstraction layer sources providing a
+ platform agnostic API to access hardware platform specific functions.
+
+ - `main`: contains the main function and calls to platform initialization
+ logic to set things up before launching the main loop.
+ It also contains sources common to all use case implementations.
+
+ - `tensorflow-lite-micro`: contains abstraction around TensorFlow Lite Micro API
+ implementing common functions to initialize a neural network model, run an inference, and
+ access inference results.
+
+ - `use_case`: contains the ML use-case specific logic. Having this as a separate sub-folder isolates ML specific
+ application logic with the assumption that the `application` will do all the required set up for logic here to run.
+ It also makes it easier to add a new use case block.
+
+ - `tests`: contains the x86 tests for the use case applications.
+
+Hardware abstraction layer has the following structure:
+
+```tree
+hal
+├── hal.c
+├── include
+│   └── ...
+└── platforms
+ ├── bare-metal
+ │   ├── bsp
+ │   │   ├── bsp-core
+ │   │   │   └── include
+ │   │   ├── bsp-packs
+ │   │   │   └── mps3
+ │   │   ├── cmsis-device
+ │   │   ├── include
+ │   │   └── mem_layout
+ │   ├── data_acquisition
+ │   ├── data_presentation
+ │   │   ├── data_psn.c
+ │   │   └── lcd
+ │   │   └── include
+ │   ├── images
+ │   ├── timer
+ │   └── utils
+ └── native
+```
+
+- `include` and `hal.c`: contains the hardware abstraction layer (HAL) top level platform API and data acquisition, data
+presentation and timer interfaces.
+ > Note: the files here and lower in the hierarchy have been written in
+ C and this layer is a clean C/C++ boundary in the sources.
+
+- `platforms/bare-metal/data_acquisition`\
+`platforms/bare-metal/data_presentation`\
+`platforms/bare-metal/timer`\
+`platforms/bare-metal/utils`: contains bare metal HAL support layer and platform initialisation helpers. Function calls
+ are routed to platform specific logic at this level. For example, for data presentation, an `lcd` module has been used.
+ This wraps the LCD driver calls for the actual hardware (for example MPS3).
+
+- `platforms/bare-metal/bsp/bsp-packs`: contains the core low-level drivers (written in C) for the platform reside.
+ For supplied examples this happens to be an MPS3 board, but support could be added here for other platforms too.
+ The functions defined in this space are wired to the higher level functions under HAL (as those in `platforms/bare-metal/` level).
+
+- `platforms/bare-metal/bsp/bsp-packs/mps3/include`\
+`platforms/bare-metal/bsp/bsp-packs/mps3`: contains the peripheral (LCD, UART and timer) drivers specific to MPS3 board.
+
+- `platforms/bare-metal/bsp/bsp-core`\
+`platforms/bare-metal/bsp/include`: contains the BSP core sources common to all BSPs. These include a UART header
+ (only the implementation of this is platform specific, but the API is common) and "re-targeting" of the standard output
+ and error streams to the UART block.
+
+- `platforms/bare-metal/bsp/cmsis-device`: contains the CMSIS template implementation for the CPU and also device
+ initialisation routines. It is also where the system interrupts are set up and handlers are overridden.
+ The main entry point of a bare metal application will most likely reside in this space. This entry point is
+ responsible for setting up before calling the user defined "main" function in the higher level `application` logic.
+
+- `platforms/bare-metal/bsp/mem_layout`: contains the platform specific linker scripts.
+
+### Models and resources
+
+The models used in the use cases implemented in this project can be downloaded
+from [Arm ML-Zoo](https://github.com/ARM-software/ML-zoo/).
+
+- [Mobilenet V2](https://github.com/ARM-software/ML-zoo/blob/master/models/image_classification/mobilenet_v2_1.0_224/tflite_uint8).
+- [DS-CNN](https://github.com/ARM-software/ML-zoo/blob/master/models/keyword_spotting/ds_cnn_large/tflite_clustered_int8).
+- [Wav2Letter](https://github.com/ARM-software/ML-zoo/blob/master/models/speech_recognition/wav2letter/tflite_int8).
+- Anomaly Detection (coming soon).
+
+When using Ethos-U55 backend, the NN model is assumed to be optimized by Vela compiler.
+However, even if not, it will fall back on the CPU and execute, if supported by TensorFlow Lite Micro.
+
+![Vela compiler](./media/vela_flow.jpg)
+
+The Vela compiler is a tool that can optimize a neural network model
+into a version that can run on an embedded system containing Ethos-U55.
+
+The optimized model will contain custom operators for sub-graphs of the
+model that can be accelerated by Ethos-U55, the remaining layers that
+cannot be accelerated are left unchanged and will run on the CPU using
+optimized (CMSIS-NN) or reference kernels provided by the inference
+engine.
+
+For detailed information see [Optimize model with Vela compiler](./sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+## Building
+
+This section describes how to build the code sample applications from sources - illustrating the build
+options and the process.
+
+The project can be built for MPS3 FPGA and FVP emulating MPS3. Default values for configuration parameters
+will build executable models with Ethos-U55 support.
+See:
+
+- [Building](./sections/building.md)
+ - [Build prerequisites](./sections/building.md#build-prerequisites)
+ - [Build options](./sections/building.md#build-options)
+ - [Build Process](./sections/building.md#build-process)
+ - [Preparing build environment](./sections/building.md#Preparing-build-environment)
+ - [Create a build directory](./sections/building.md#Create-a-build-directory)
+ - [Configuring the build for `MPS3: SSE-300`](./sections/building.md#Configuring-the-build-for-`MPS3:-SSE-300`)
+ - [Configuring build for different Arm Ethos-U55 configurations](./sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations)
+ - [Configuring the build for `MPS3: SSE-200`](./sections/building.md#Configuring-the-build-for-`MPS3:-SSE-200`)
+ - [Configuring the build native unit-test](./sections/building.md#configuring-the-build-native-unit-test)
+ - [Configuring the build for `simple_platform`](./sections/building.md#configuring-the-build-for-`simple_platform`)
+ - [Building the configured project](./sections/building.md#Building-the-configured-project)
+ - [Building timing adapter with custom options](./sections/building.md#building-timing-adapter-with-custom-options)
+ - [Add custom inputs](./sections/building.md#add-custom-inputs)
+ - [Add custom model](./sections/building.md#add-custom-model)
+ - [Optimize custom model with Vela compiler](./sections/building.md#Optimize-custom-model-with-Vela-compiler)
+ - [Memory constraints](./sections/building.md#memory-constraints)
+ - [Automatic file generation](./sections/building.md#automatic-file-generation)
+
+## Deployment
+
+This section describes how to deploy the code sample applications on the Fixed Virtual Platform or the MPS3 board.
+See:
+
+- [Deployment](./sections/deployment.md)
+ - [Fixed Virtual Platform](./sections/deployment.md#fixed-Virtual-Platform)
+ - [Setting up the MPS3 Corstone-300 FVP](./sections/deployment.md#Setting-up-the-MPS3-Corstone-300-FVP)
+ - [Deploying on an FVP emulating MPS3](./sections/deployment.md#Deploying-on-an-FVP-emulating-MPS3)
+ - [MPS3 board](./sections/deployment.md#MPS3-board)
+ - [Deployment on MPS3 board](./sections/deployment.md#Deployment-on-MPS3-board)
+
+## Running code samples applications
+
+This section covers the process for getting started with pre-built binaries for the code samples.
+See [Running applications](./sections/run.md).
+
+## Implementing custom ML application
+
+This section describes how to implement a custom Machine Learning application running
+on a platform supported by the repository (Fixed Virtual Platform or an MPS3 board).
+
+Ethos-U55 NPU Code Samples software project offers a simple way to incorporate additional
+use-case code into the existing infrastructure and provides a build
+system that automatically picks up added functionality and produces
+corresponding executable for each use-case.
+
+See:
+
+- [Customizing](./sections/customizing.md)
+ - [Software project description](./sections/customizing.md#Software-project-description)
+ - [HAL API](./sections/customizing.md#hal-api)
+ - [Main loop function](./sections/customizing.md#main-loop-function)
+ - [Application context](./sections/customizing.md#application-context)
+ - [Profiler](./sections/customizing.md#Profiler)
+ - [NN Model API](./sections/customizing.md#NN-model-API)
+ - [Adding custom ML use-case](./sections/customizing.md#Adding-custom-ML-use-case)
+ - [Implementing main loop](./sections/customizing.md#Implementing-main-loop)
+ - [Implementing custom NN model](./sections/customizing.md#Implementing-custom-NN-model)
+ - [Executing inference](./sections/customizing.md#executing-inference)
+ - [Printing to console](./sections/customizing.md#printing-to-console)
+ - [Reading user input from console](./sections/customizing.md#reading-user-input-from-console)
+ - [Output to MPS3 LCD](./sections/customizing.md#output-to-MPS3-LCD)
+ - [Building custom use-case](./sections/customizing.md#building-custom-use-case)
+
+## Testing and benchmarking
+
+See [Testing and benchmarking](./sections/testing_benchmarking.md).
+
+## Troubleshooting
+
+See:
+
+- [Troubleshooting](./sections/troubleshooting.md)
+ - [Inference results are incorrect for my custom files](./sections/troubleshooting.md#Inference-results-are-incorrect-for-my-custom-files)
+ - [The application does not work with my custom model](./sections/troubleshooting.md#The-application-does-not-work-with-my-custom-model)
+
+## Appendix
+
+See:
+
+- [Appendix](./sections/appendix.md)
+ - [Cortex-M55 Memory map overview](./sections/appendix.md#cortex-m55-memory-map-overview)
+
+## Contribution guidelines
+
+Contributions are only accepted under the following conditions:
+
+- The contribution have certified origin and give us your permission. To manage this process we use
+ [Developer Certificate of Origin (DCO) V1.1](https://developercertificate.org/).
+ To indicate that contributors agree to the the terms of the DCO, it's neccessary "sign off" the
+ contribution by adding a line with name and e-mail address to every git commit message:
+
+ ```log
+ Signed-off-by: John Doe <john.doe@example.org>
+ ```
+
+ This can be done automatically by adding the `-s` option to your `git commit` command.
+ You must use your real name, no pseudonyms or anonymous contributions are accepted.
+
+- You give permission according to the [Apache License 2.0](../LICENSE_APACHE_2.0.txt).
+
+ In each source file, include the following copyright notice:
+
+ ```copyright
+ /*
+ * Copyright (c) <years additions were made to project> <your name>, Arm Limited. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ ```
+
+### Coding standards and guidelines
+
+This repository follows a set of guidelines, best practices, programming styles and conventions,
+see:
+
+- [Coding standards and guidelines](./sections/coding_guidelines.md)
+ - [Introduction](./sections/coding_guidelines.md#introduction)
+ - [Language version](./sections/coding_guidelines.md#language-version)
+ - [File naming](./sections/coding_guidelines.md#file-naming)
+ - [File layout](./sections/coding_guidelines.md#file-layout)
+ - [Block Management](./sections/coding_guidelines.md#block-management)
+ - [Naming Conventions](./sections/coding_guidelines.md#naming-conventions)
+ - [C++ language naming conventions](./sections/coding_guidelines.md#c_language-naming-conventions)
+ - [C language naming conventions](./sections/coding_guidelines.md#c-language-naming-conventions)
+ - [Layout and formatting conventions](./sections/coding_guidelines.md#layout-and-formatting-conventions)
+ - [Language usage](./sections/coding_guidelines.md#language-usage)
+
+### Code Reviews
+
+Contributions must go through code review. Code reviews are performed through the
+[mlplatform.org Gerrit server](https://review.mlplatform.org). Contributors need to signup to this
+Gerrit server with their GitHub account credentials.
+In order to be merged a patch needs to:
+
+- get a "+1 Verified" from the pre-commit job.
+- get a "+2 Code-review" from a reviewer, it means the patch has the final approval.
+
+### Testing
+
+Prior to submitting a patch for review please make sure that all build variants works and unit tests pass.
+Contributions go through testing at the continuous integration system. All builds, tests and checks must pass before a
+contribution gets merged to the master branch.
+
+## Licenses
+
+The ML Embedded applications samples are provided under the Apache 2.0 license, see [License Apache 2.0](../LICENSE_APACHE_2.0.txt).
+
+Application input data sample files are provided under their original license:
+
+| | Licence | Provenience |
+|---------------|---------|---------|
+| [Automatic Speech Recognition Samples](../resources/asr/samples/files.md) | [Creative Commons Attribution 4.0 International Public License](../resources/LICENSE_CC_4.0.txt) | <http://www.openslr.org/12/> |
+| [Image Classification Samples](../resources/img_class/samples/files.md) | [Creative Commons Attribution 1.0](../resources/LICENSE_CC_1.0.txt) | <https://www.pexels.com> |
+| [Keyword Spotting Samples](../resources/kws/samples/files.md) | [Creative Commons Attribution 4.0 International Public License](../resources/LICENSE_CC_4.0.txt) | <http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz> |
+| [Keyword Spotting and Automatic Speech Recognition Samples](../resources/kws_asr/samples/files.md) | [Creative Commons Attribution 4.0 International Public License](../resources/LICENSE_CC_4.0.txt) | <http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz> |
diff --git a/docs/media/APIs_description.png b/docs/media/APIs_description.png
new file mode 100644
index 0000000..57e2b32
--- /dev/null
+++ b/docs/media/APIs_description.png
Binary files differ
diff --git a/docs/media/ASR_preprocessing.png b/docs/media/ASR_preprocessing.png
new file mode 100644
index 0000000..3383a2e
--- /dev/null
+++ b/docs/media/ASR_preprocessing.png
Binary files differ
diff --git a/docs/media/F1.png b/docs/media/F1.png
new file mode 100644
index 0000000..b843e1e
--- /dev/null
+++ b/docs/media/F1.png
Binary files differ
diff --git a/docs/media/F2.png b/docs/media/F2.png
new file mode 100644
index 0000000..ab903e8
--- /dev/null
+++ b/docs/media/F2.png
Binary files differ
diff --git a/docs/media/F3.png b/docs/media/F3.png
new file mode 100644
index 0000000..0effcb7
--- /dev/null
+++ b/docs/media/F3.png
Binary files differ
diff --git a/docs/media/F4.png b/docs/media/F4.png
new file mode 100644
index 0000000..c7f6ac1
--- /dev/null
+++ b/docs/media/F4.png
Binary files differ
diff --git a/docs/media/KWS_preprocessing.png b/docs/media/KWS_preprocessing.png
new file mode 100644
index 0000000..7a6f3fd
--- /dev/null
+++ b/docs/media/KWS_preprocessing.png
Binary files differ
diff --git a/docs/media/fvp.png b/docs/media/fvp.png
new file mode 100644
index 0000000..ca4ffa5
--- /dev/null
+++ b/docs/media/fvp.png
Binary files differ
diff --git a/docs/media/fvpterminal.png b/docs/media/fvpterminal.png
new file mode 100644
index 0000000..ff39152
--- /dev/null
+++ b/docs/media/fvpterminal.png
Binary files differ
diff --git a/docs/media/mps3.png b/docs/media/mps3.png
new file mode 100644
index 0000000..3fb0dff
--- /dev/null
+++ b/docs/media/mps3.png
Binary files differ
diff --git a/docs/media/vela_flow.jpg b/docs/media/vela_flow.jpg
new file mode 100644
index 0000000..1f052ee
--- /dev/null
+++ b/docs/media/vela_flow.jpg
Binary files differ
diff --git a/docs/quick_start.md b/docs/quick_start.md
new file mode 100644
index 0000000..f557c72
--- /dev/null
+++ b/docs/quick_start.md
@@ -0,0 +1,95 @@
+# Quick start example ML application
+
+This is a quick start guide that will show you how to run the keyword spotting example application. The aim of this guide
+is to illustrate the flow of running an application on the evaluation kit rather than showing the keyword spotting
+functionality or performance. All use cases in the evaluation kit follow the steps.
+
+1. Verify you have installed [the required prerequisites](sections/building.md#Build-prerequisites).
+
+2. Clone the Ethos-U55 evaluation kit repository.
+
+ ```commandline
+ git clone "https://review.mlplatform.org/ml/ethos-u/ml-embedded-evaluation-kit"
+ cd ml-embedded-evaluation-kit
+ ```
+
+3. Pull all the external dependencies with the commands below:
+
+ ```commandline
+ git submodule update --init
+ ```
+
+4. Next, you would need to get a neural network model. For the purpose of this quick start guide, we'll use the
+ `ds_cnn_clustered_int8` keyword spotting model from the [Arm public model zoo](https://github.com/ARM-software/ML-zoo)
+ and the principle remains the same for all of the other use cases. Download the `ds_cnn_large_int8.tflite` model
+ file with the curl command below:
+
+ ```commandline
+ curl -L https://github.com/ARM-software/ML-zoo/blob/master/models/keyword_spotting/ds_cnn_large/tflite_clustered_int8/ds_cnn_clustered_int8.tflite?raw=true --output ds_cnn_clustered_int8.tflite
+ ```
+
+5. [Vela](https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela) is an open-source python tool converting
+ TensorFlow Lite for Microcontrollers neural network model into an optimized model that can run on an embedded system
+ containing an Ethos-U55 NPU. It is worth noting that in order to take full advantage of the capabilities of the NPU, the
+ neural network operators should be [supported by Vela](https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/HEAD/SUPPORTED_OPS.md).
+ In this step, you will compile the model with Vela.
+
+ For this step, you need to ensure you have [correctly installed the Vela package](https://pypi.org/project/ethos-u-vela/):
+
+ ```commandline
+ python3 -m venv env
+ source ./env/bin/activate
+ pip install --upgrade pip
+ pip install ethos-u-vela
+ ```
+
+ In the command below, we specify that we are using the Arm® Ethos™-U55 NPU with a 128 Multiply-Accumulate units
+ (MAC units) configured for a High End Embedded use case. The [building section](sections/building.md#Optimize-custom-model-with-Vela-compiler)
+ has more detailed explanation about Vela usage.
+
+ ```commandline
+ vela ds_cnn_clustered_int8.tflite \
+ --accelerator-config=ethos-u55-128 \
+ --block-config-limit=0 \
+ --config scripts/vela/vela.ini \
+ --memory-mode Shared_Sram \
+ --system-config Ethos_U55_High_End_Embedded
+ ```
+
+ An optimized model file for Ethos-U55 is generated in a folder named `output`.
+
+6. Create a `build` folder in the root level of the evaluation kit.
+
+ ```commandline
+ mkdir build && cd build
+ ```
+
+7. Build the makefiles with `CMake` as shown in the command below. The [build process section](sections/building.md#Build-process)
+ gives an in-depth explanation about the meaning of every parameter. For the time being, note that we point the Vela
+ optimized model from stage 5 in the `-Dkws_MODEL_TFLITE_PATH` parameter.
+
+ ```commandline
+ cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws \
+ -Dkws_MODEL_TFLITE_PATH=output/ds_cnn_clustered_int8_vela.tflite \
+ ..
+ ```
+
+8. Compile the project with a `make`. Details about this stage can be found in the [building part of the documentation](sections/building.md#Building-the-configured-project).
+
+ ```commandline
+ make -j4
+ ```
+
+9. Launch the project as explained [here](sections/deployment.md#Deployment). In this quick-start guide, we'll use the Fixed
+ Virtual Platform. Point the generated `bin/ethos-u-kws.axf` file in stage 8 to the FVP that you have downloaded when
+ installing the prerequisites.
+
+ ```commandline
+ <path_to_FVP>/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-kws.axf
+ ```
+
+10. A telnet window is launched through which you can interact with the application and obtain performance figures.
diff --git a/docs/sections/appendix.md b/docs/sections/appendix.md
new file mode 100644
index 0000000..7b56faa
--- /dev/null
+++ b/docs/sections/appendix.md
@@ -0,0 +1,20 @@
+# Appendix
+
+## Arm® Cortex®-M55 Memory map overview for Corstone™-300 reference design
+
+The table below is the memory mapping information specific to the Arm® Cortex®-M55.
+
+| Name | Base address | Limit address | Size | IDAU | Remarks |
+|-------|--------------|---------------|-----------|------|-----------------------------------------------------------|
+| ITCM | 0x0000_0000 | 0x0007_FFFF | 512 kiB | NS | ITCM code region |
+| BRAM | 0x0100_0000 | 0x0120_0000 | 2 MiB | NS | FPGA data SRAM region |
+| DTCM | 0x2000_0000 | 0x2007_FFFF | 512 kiB | NS | 4 banks for 128 kiB each |
+| SRAM | 0x2100_0000 | 0x213F_FFFF | 4 MiB | NS | 2 banks of 2 MiB each as SSE-300 internal SRAM region |
+| DDR | 0x6000_0000 | 0x6FFF_FFFF | 256 MiB | NS | DDR memory region |
+| ITCM | 0x1000_0000 | 0x1007_FFFF | 512 kiB | S | ITCM code region |
+| BRAM | 0x1100_0000 | 0x1120_0000 | 2 MiB | S | FPGA data SRAM region |
+| DTCM | 0x3000_0000 | 0x3007_FFFF | 512 kiB | S | 4 banks for 128 kiB each |
+| SRAM | 0x3100_0000 | 0x313F_FFFF | 4 MiB | S | 2 banks of 2 MiB each as SSE-300 internal SRAM region |
+| DDR | 0x7000_0000 | 0x7FFF_FFFF | 256 MiB | S | DDR memory region |
+
+Default memory map can be found here: https://developer.arm.com/documentation/101051/0002/Memory-model/Memory-map \ No newline at end of file
diff --git a/docs/sections/building.md b/docs/sections/building.md
new file mode 100644
index 0000000..56771b8
--- /dev/null
+++ b/docs/sections/building.md
@@ -0,0 +1,1023 @@
+# Building the Code Samples application from sources
+
+## Contents
+
+- [Building the Code Samples application from sources](#building-the-code-samples-application-from-sources)
+ - [Contents](#contents)
+ - [Build prerequisites](#build-prerequisites)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Preparing build environment](#preparing-build-environment)
+ - [Create a build directory](#create-a-build-directory)
+ - [Configuring the build for `MPS3: SSE-300`](#configuring-the-build-for-mps3-sse-300)
+ - [Configuring the build for `MPS3: SSE-200`](#configuring-the-build-for-mps3-sse-200)
+ - [Configuring the build native unit-test](#configuring-the-build-native-unit-test)
+ - [Configuring the build for `simple_platform`](#configuring-the-build-for-simple_platform)
+ - [Building the configured project](#building-the-configured-project)
+ - [Building timing adapter with custom options](#building-timing-adapter-with-custom-options)
+ - [Add custom inputs](#add-custom-inputs)
+ - [Add custom model](#add-custom-model)
+ - [Optimize custom model with Vela compiler](#optimize-custom-model-with-vela-compiler)
+ - [Memory constraints](#memory-constraints)
+ - [Automatic file generation](#automatic-file-generation)
+
+This section assumes the use of an **x86 Linux** build machine.
+
+## Build prerequisites
+
+Before proceeding, please, make sure that the following prerequisites
+are fulfilled:
+
+- Arm Compiler version 6.14 or above is installed and available on the
+ path.
+
+ Test the compiler by running:
+
+ ```commandline
+ armclang -v
+ ```
+
+ ```log
+ Product: ARM Compiler 6.14 Professional
+ Component: ARM Compiler 6.14
+ ```
+
+ > **Note:** Add compiler to the path, if needed:
+ >
+ > `export PATH=/path/to/armclang/bin:$PATH`
+
+- Compiler license is configured correctly
+
+- CMake version 3.15 or above is installed and available on the path.
+ Test CMake by running:
+
+ ```commandline
+ cmake --version
+ ```
+
+ ```log
+ cmake version 3.16.2
+ ```
+
+ > **Note:** Add cmake to the path, if needed:
+ >
+ > `export PATH=/path/to/cmake/bin:$PATH`
+
+- Python 3.6 or above is installed. Test python version by running:
+
+ ```commandline
+ python3 --version
+ ```
+
+ ```log
+ Python 3.6.8
+ ```
+
+- Build system will create python virtual environment during the build
+ process. Please make sure that python virtual environment module is
+ installed:
+
+ ```commandline
+ python3 -m venv
+ ```
+
+- Make or MinGW make For Windows
+
+ ```commandline
+ make --version
+ ```
+
+ ```log
+ GNU Make 4.1
+
+ ...
+ ```
+
+ > **Note:** Add it to the path environment variable, if needed.
+
+- Access to the Internet to download the third party dependencies, specifically: TensorFlow Lite Micro, Arm Ethos-U55
+driver and CMSIS. Instructions for downloading these are listed under [preparing build environment](#preparing-build-environment).
+
+## Build options
+
+The project build system allows user to specify custom NN
+model (in `.tflite` format) or images and compile application binary from
+sources.
+
+The build system uses pre-built TensorFlow Lite for Microcontrollers
+library and Arm® Ethos™-U55 driver libraries from the delivery package.
+
+The build script is parameterized to support different options. Default
+values for build parameters will build the executable compatible with
+the Ethos-U55 Fast Model.
+
+The build parameters are:
+
+- `TARGET_PLATFORM`: Target platform to execute application:
+ - `mps3`
+ - `native`
+ - `simple_plaform`
+
+- `TARGET_SUBSYSTEM`: Platform target subsystem; this specifies the
+ design implementation for the deployment target. For both, the MPS3
+ FVP and the MPS3 FPGA, this should be left to the default value of
+ SSE-300:
+ - `sse-300` (default - [Arm® Corstone™-300](https://developer.arm.com/ip-products/subsystem/corstone/corstone-300))
+ - `sse-200`
+
+- `TENSORFLOW_SRC_PATH`: Path to the root of the TensorFlow directory.
+ The default value points to the TensorFlow submodule in the
+ [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.
+
+- `ETHOS_U55_DRIVER_SRC_PATH`: Path to the Ethos-U55 core driver sources.
+ The default value points to the core_driver submodule in the
+ [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.
+
+- `CMSIS_SRC_PATH`: Path to the CMSIS sources to be used to build TensorFlow
+ Lite Micro library. This parameters is optional and valid only for
+ Arm® Cortex®-M CPU targeted configurations. The default value points to the CMSIS submodule in the
+ [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder.
+
+- `ETHOS_U55_ENABLED`: Sets whether the use of Ethos-U55 is available for
+ the deployment target. By default, this is set and therefore
+ application is built with Ethos-U55 supported.
+
+- `CPU_PROFILE_ENABLED`: Sets whether profiling information for the CPU
+ core should be displayed. By default, this is set to false, but can
+ be turned on for FPGA targets. The the FVP, the CPU core's cycle
+ counts are not meaningful and should not be used.
+
+- `LOG_LEVEL`: Sets the verbosity level for the application's output
+ over UART/stdout. Valid values are `LOG_LEVEL_TRACE`, `LOG_LEVEL_DEBUG`,
+ `LOG_LEVEL_INFO`, `LOG_LEVEL_WARN` and `LOG_LEVEL_ERROR`. By default, it
+ is set to `LOG_LEVEL_INFO`.
+
+- `<use_case>_MODEL_TFLITE_PATH`: Path to the model file that will be
+ processed and included into the application axf file. The default
+ value points to one of the delivered set of models. Make sure the
+ model chosen is aligned with the `ETHOS_U55_ENABLED` setting.
+
+ - When using Ethos-U55 backend, the NN model is assumed to be
+ optimized by Vela compiler.
+ However, even if not, it will fall back on the CPU and execute,
+ if supported by TensorFlow Lite Micro.
+
+ - When use of Ethos-U55 is disabled, and if a Vela optimized model
+ is provided, the application will report a failure at runtime.
+
+- `USE_CASE_BUILD`: specifies the list of applications to build. By
+ default, the build system scans sources to identify available ML
+ applications and produces executables for all detected use-cases.
+ This parameter can accept single value, for example,
+ `USE_CASE_BUILD=img_class` or multiple values, for example,
+ `USE_CASE_BUILD="img_class;kws"`.
+
+- `ETHOS_U55_TIMING_ADAPTER_SRC_PATH`: Path to timing adapter sources.
+ The default value points to the `timing_adapter` dependencies folder.
+
+- `TA_CONFIG_FILE`: Path to the CMake configuration file containing the
+ timing adapter parameters. Used only if the timing adapter build is
+ enabled.
+
+- `TENSORFLOW_LITE_MICRO_CLEAN_BUILD`: Optional parameter to enable/disable
+ "cleaning" prior to building for the TensorFlow Lite Micro library.
+ It is enabled by default.
+
+- `TENSORFLOW_LITE_MICRO_CLEAN_DOWNLOADS`: Optional parameter to enable wiping
+ out TPIP downloads from TensorFlow source tree prior to each build.
+ It is disabled by default.
+
+- `ARMCLANG_DEBUG_DWARF_LEVEL`: When the CMake build type is specified as `Debug`
+ and when armclang toolchain is being used to build for a Cortex-M CPU target,
+ this optional argument can be set to specify the DWARF format.
+ By default, this is set to 4 and is synonymous with passing `-g`
+ flag to the compiler. This is compatible with Arm-DS and other tools
+ which can interpret the latest DWARF format. To allow debugging using
+ the Model Debugger from Arm FastModel Tools Suite, this argument can be used
+ to pass DWARF format version as "3". Note: this option is only available
+ when CMake project is configured with `-DCMAKE_BUILD_TYPE=Debug` argument.
+ Also, the same dwarf format is used for building TensorFlow Lite Micro library.
+
+> **Note:** For details on the specific use case build options, follow the
+> instructions in the use-case specific documentation.
+> Also, when setting any of the CMake configuration parameters that expect a directory/file path , it is advised
+>to **use absolute paths instead of relative paths**.
+
+## Build process
+
+The build process can summarized in three major steps:
+
+- Prepare the build environment by downloading third party sources required, see
+[Preparing build environment](#preparing-build-environment).
+
+- Configure the build for the platform chosen.
+This stage includes:
+ - CMake options configuration
+ - When `<use_case>_MODEL_TFLITE_PATH` build options aren't provided, defaults neural network models are be downloaded
+from [Arm ML-Zoo](https://github.com/ARM-software/ML-zoo/). In case of native build, network's input and output data
+for tests are downloaded.
+ - Some files such as neural network models, network's inputs and output labels are automatically converted
+ into C/C++ arrays, see [Automatic file generation](#automatic-file-generation).
+
+- Build the application.\
+During this stage application and third party libraries are built see [Building the configured project](#building-the-configured-project).
+
+### Preparing build environment
+
+Certain third party sources are required to be present on the development machine to allow the example sources in this
+repository to link against.
+
+1. [TensorFlow Lite Micro repository](https://github.com/tensorflow/tensorflow)
+2. [Ethos-U55 core driver repository](https://review.mlplatform.org/admin/repos/ml/ethos-u/ethos-u-core-driver)
+3. [CMSIS-5](https://github.com/ARM-software/CMSIS_5.git)
+
+These are part of the [ethos-u repository](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) and set as
+submodules of this project.
+
+To pull the submodules:
+
+```sh
+git submodule update --init
+```
+
+This will download all the required components and place them in a tree like:
+
+```tree
+dependencies
+ └── ethos-u
+    ├── cmsis
+    ├── core_driver
+   ├── tensorflow
+ └── ...
+```
+
+> **NOTE**: The default source paths for the TPIP sources assume the above directory structure, but all of the relevant
+>paths can be overridden by CMake configuration arguments `TENSORFLOW_SRC_PATH`, `ETHOS_U55_DRIVER_SRC_PATH`,
+>and `CMSIS_SRC_PATH`.
+
+### Create a build directory
+
+Create a build directory in the root of the project and navigate inside:
+
+```commandline
+mkdir build && cd build
+```
+
+### Configuring the build for `MPS3: SSE-300`
+
+On Linux, execute the following command to build the application to run
+on the Ethos-U55 when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific
+file to set the compiler and platform specific parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 ..
+```
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver and CMSIS are not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` and `CMSIS_SRC_PATH` can be used to configure their location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DCMSIS_SRC_PATH=/my/custom/location/cmsis ..
+```
+
+> **Note:** If re-building with changed parameters values, it is
+highly advised to clean the build directory and re-run the CMake command.
+
+### Configuring the build for `MPS3: SSE-200`
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-200 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+for Windows add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-200 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -G "MinGW Makefiles ..
+```
+
+### Configuring the build native unit-test
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=native \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake ..
+```
+
+For Windows add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=native \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake \
+ -G "MinGW Makefiles ..
+```
+
+Results of the build will be placed under `build/bin/` folder:
+
+```tree
+ bin
+ |- dev_ethosu_eval-tests
+ |_ ethos-u
+```
+
+### Configuring the build for `simple_platform`
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=simple_platform \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=simple_platform \
+ -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake \
+ -G "MinGW Makefiles" ..
+```
+
+### Building the configured project
+
+If the CMake command succeeds, build the application as follows:
+
+```commandline
+make -j4
+```
+
+or for Windows:
+
+```commandline
+mingw32-make -j4
+```
+
+Add `VERBOSE=1` to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder, an
+example:
+
+```tree
+bin
+ ├── ethos-u-<use_case_name>.axf
+ ├── ethos-u-<use_case_name>.htm
+ ├── ethos-u-<use_case_name>.map
+ ├── images-<use_case_name>.txt
+ └── sectors
+ └── <use_case>
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where for each implemented use-case under the `source/use-case` directory,
+the following build artefacts will be created:
+
+- `ethos-u-<use case name>.axf`: The built application binary for a ML
+ use case.
+
+- `ethos-u-<use case name>.map`: Information from building the
+ application (e.g. libraries used, what was optimized, location of
+ objects).
+
+- `ethos-u-<use case name>.htm`: Human readable file containing the
+ call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files
+ for loading into different FPGA memory regions.
+
+- `images-<use case name>.txt`: Tells the FPGA which memory regions to
+ use for loading the binaries in sectors/** folder.
+
+> **Note:** For the specific use case commands see the relative section
+in the use case documentation.
+
+## Building timing adapter with custom options
+
+The sources also contains the configuration for a timing adapter utility
+for the Ethos-U55 driver. The timing adapter allows the platform to simulate user
+provided memory bandwidth and latency constraints.
+
+The timing adapter driver aims to control the behavior of two AXI buses
+used by Ethos-U55. One is for SRAM memory region and the other is for
+flash or DRAM. The SRAM is where intermediate buffers are expected to be
+allocated and therefore, this region can serve frequent R/W traffic
+generated by computation operations while executing a neural network
+inference. The flash or DDR is where we expect to store the model
+weights and therefore, this bus would typically be used only for R/O
+traffic.
+
+It is used for MPS3 FPGA as well as for Fast Model environment.
+
+The CMake build framework allows the parameters to control the behavior
+of each bus with following parameters:
+
+- `MAXR`: Maximum number of pending read operations allowed. 0 is
+ inferred as infinite, and the default value is 4.
+
+- `MAXW`: Maximum number of pending write operations allowed. 0 is
+ inferred as infinite, and the default value is 4.
+
+- `MAXRW`: Maximum number of pending read+write operations allowed. 0 is
+ inferred as infinite, and the default value is 8.
+
+- `RLATENCY`: Minimum latency, in cycle counts, for a read operation.
+ This is the duration between ARVALID and RVALID signals. The default
+ value is 50.
+
+- `WLATENCY`: Minimum latency, in cycle counts, for a write operation.
+ This is the duration between WVALID + WLAST and BVALID being
+ de-asserted. The default value is 50.
+
+- `PULSE_ON`: Number of cycles during which addresses are let through.
+ The default value is 5100.
+
+- `PULSE_OFF`: Number of cycles during which addresses are blocked. The
+ default value is 5100.
+
+- `BWCAP`: Maximum number of 64-bit words transferred per pulse cycle. A
+ pulse cycle is PULSE_ON + PULSE_OFF. 0 is inferred as infinite, and
+ the default value is 625.
+
+- `MODE`: Timing adapter operation mode. Default value is 0
+
+ - Bit 0: 0=simple; 1=latency-deadline QoS throttling of read vs.
+ write
+
+ - Bit 1: 1=enable random AR reordering (0=default),
+
+ - Bit 2: 1=enable random R reordering (0=default),
+
+ - Bit 3: 1=enable random B reordering (0=default)
+
+For timing adapter's CMake build configuration, the SRAM AXI is assigned
+index 0 and the flash/DRAM AXI bus has index 1. To change the bus
+parameter for the build a "***TA_\<index>_**"* prefix should be added
+to the above. For example, **TA0_MAXR=10** will set the SRAM AXI bus's
+maximum pending reads to 10.
+
+As an example, if we have the following parameters for flash/DRAM
+region:
+
+- `TA1_MAXR` = "2"
+
+- `TA1_MAXW` = "0"
+
+- `TA1_MAXRW` = "0"
+
+- `TA1_RLATENCY` = "64"
+
+- `TA1_WLATENCY` = "32"
+
+- `TA1_PULSE_ON` = "320"
+
+- `TA1_PULSE_OFF` = "80"
+
+- `TA1_BWCAP` = "50"
+
+For a clock rate of 500MHz, this would translate to:
+
+- The maximum duty cycle for any operation is:\
+![Maximum duty cycle formula](../media/F1.png)
+
+- Maximum bit rate for this bus (64-bit wide) is:\
+![Maximum bit rate formula](../media/F2.png)
+
+- With a read latency of 64 cycles, and maximum pending reads as 2,
+ each read could be a maximum of 64 or 128 bytes, as defined for
+ Ethos-U55\'s AXI bus\'s attribute.
+
+ The bandwidth is calculated solely by read parameters ![Bandwidth formula](
+ ../media/F3.png)
+
+ This is higher than the overall bandwidth dictated by the bus parameters
+ of \
+ ![Overall bandwidth formula](../media/F4.png)
+
+This suggests that the read operation is limited only by the overall bus
+bandwidth.
+
+Timing adapter requires recompilation to change parameters. Default timing
+adapter configuration file pointed to by `TA_CONFIG_FILE` build parameter is
+located in the scripts/cmake folder and contains all options for AXI0 and
+AXI1 described above.
+
+An example of scripts/cmake/ta_config.cmake:
+
+```cmake
+# Timing adapter options
+set(TA_INTERACTIVE OFF)
+
+# Timing adapter settings for AXI0
+set(TA0_MAXR "8")
+set(TA0_MAXW "8")
+set(TA0_MAXRW "0")
+set(TA0_RLATENCY "32")
+set(TA0_WLATENCY "32")
+set(TA0_PULSE_ON "3999")
+set(TA0_PULSE_OFF "1")
+set(TA0_BWCAP "4000")
+...
+```
+
+An example of the build with custom timing adapter configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTA_CONFIG_FILE=scripts/cmake/my_ta_config.cmake ..
+```
+
+## Add custom inputs
+
+The application performs inference on input data found in the folder set
+by the CMake parameters, for more information see the 3.3 section in the
+specific use case documentation.
+
+## Add custom model
+
+The application performs inference using the model pointed to by the
+CMake parameter `MODEL_TFLITE_PATH`.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom
+model has been run through the Vela compiler successfully before continuing.
+
+To run the application with a custom model you will need to provide a
+labels_<model_name>.txt file of labels associated with the model.
+Each line of the file should correspond to one of the outputs in your
+model. See the provided labels_mobilenet_v2_1.0_224.txt file in the
+img_class use case for an example.
+
+Then, you must set `<use_case>_MODEL_TFLITE_PATH` to the location of
+the Vela processed model file and `<use_case>_LABELS_TXT_FILE` to the
+location of the associated labels file:
+
+```commandline
+cmake \
+ -D<use_case>_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
+ -D<use_case>_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+> **Note:** For the specific use case command see the relative section in the use case documentation.
+
+For Windows, add `-G MinGW Makefiles` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The TensorFlow Lite for Microcontrollers model pointed to by `<use_case>_MODEL_TFLITE_PATH` and
+labels text file pointed to by `<use_case>_LABELS_TXT_FILE` will be
+converted to C++ files during the CMake configuration stage and then
+compiled into the application for performing inference with.
+
+The log from the configuration stage should tell you what model path and
+labels file have been used:
+
+```log
+-- User option TARGET_PLATFORM is set to mps3
+-- User option <use_case>_MODEL_TFLITE_PATH is set to
+<path/to/custom_model_after_vela.tflite>
+...
+-- User option <use_case>_LABELS_TXT_FILE is set to
+<path/to/labels_custom_model.txt>
+...
+-- Using <path/to/custom_model_after_vela.tflite>
+++ Converting custom_model_after_vela.tflite to custom_model_after_vela.tflite.cc
+-- Generating labels file from <path/to/labels_custom_model.txt>
+-- writing to <path/to/build>/generated/include/Labels.hpp and <path/to/build>/generated/src/Labels.cc
+...
+```
+
+After compiling, your custom model will have now replaced the default
+one in the application.
+
+## Optimize custom model with Vela compiler
+
+> **Note:** This tool is not available within this project.
+It is a python tool available from <https://pypi.org/project/ethos-u-vela/>.
+The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>.
+
+The Vela compiler is a tool that can optimize a neural network model
+into a version that can run on an embedded system containing Ethos-U55.
+
+The optimized model will contain custom operators for sub-graphs of the
+model that can be accelerated by Ethos-U55, the remaining layers that
+cannot be accelerated are left unchanged and will run on the CPU using
+optimized (CMSIS-NN) or reference kernels provided by the inference
+engine.
+
+After the compilation, the optimized model can only be executed on a
+system with Ethos-U55.
+
+> **Note:** The NN model provided during the build and compiled into the application
+executable binary defines whether CPU or NPU is used to execute workloads.
+If unoptimized model is used, then inference will run on Cortex-M CPU.
+
+Vela compiler accepts parameters to influence a model optimization. The
+model provided within this project has been optimized with
+the following parameters:
+
+```commandline
+vela \
+ --accelerator-config=ethos-u55-128 \
+ --block-config-limit=0 \
+ --config my_vela_cfg.ini \
+ --memory-mode Shared_Sram \
+ --system-config Ethos_U55_High_End_Embedded \
+ <model>.tflite
+```
+
+Where:
+
+- `--accelerator-config`: Specify the accelerator configuration to use
+ between ethos-u55-256, ethos-u55-128, ethos-u55-64 and ethos-u55-32.
+- `--block-config-limit`: Limit block config search space, use zero for
+ unlimited.
+- `--config`: Specifies the path to the Vela configuration file. The format of the file is a Python ConfigParser .ini file.
+ An example can be found in the `dependencies` folder [vela.ini](../../scripts/vela/vela.ini).
+- `--memory-mode`: Selects the memory mode to use as specified in the Vela configuration file.
+- `--system-config`:Selects the system configuration to use as specified in the Vela configuration file.
+
+Vela compiler accepts `.tflite` file as input and saves optimized network
+model as a `.tflite` file.
+
+Using `--show-cpu-operations` and `--show-subgraph-io-summary` will show
+all the operations that fall back to the CPU and a summary of all the
+subgraphs and their inputs and outputs.
+
+To see Vela helper for all the parameters use: `vela --help`.
+
+Please, get in touch with your Arm representative to request access to
+Vela Compiler documentation for more details.
+
+> **Note:** By default, use of the Ethos-U55 is enabled in the CMake configuration.
+This could be changed by passing `-DETHOS_U55_ENABLED`.
+
+## Memory constraints
+
+Both the MPS3 Fixed Virtual Platform and the MPS3 FPGA platform share
+the linker script (scatter file) for SSE-300 design. The design is set
+by the CMake configuration parameter `TARGET_SUBSYSTEM` as described in
+the previuous section.
+
+The memory map exposed by this design is presented in Appendix 1. This
+can be used as a reference when editing the scatter file, especially to
+make sure that region boundaries are respected. The snippet from MPS3's
+scatter file is presented below:
+
+```
+;---------------------------------------------------------
+; First load region
+;---------------------------------------------------------
+LOAD_REGION_0 0x00000000 0x00080000
+{
+ ;-----------------------------------------------------
+ ; First part of code mem -- 512kiB
+ ;-----------------------------------------------------
+ itcm.bin 0x00000000 0x00080000
+ {
+ *.o (RESET, +First)
+ * (InRoot$$Sections)
+ .ANY (+RO)
+ }
+
+ ;-----------------------------------------------------
+ ; 128kiB of 512kiB bank is used for any other RW or ZI
+ ; data. Note: this region is internal to the Cortex-M CPU
+ ;-----------------------------------------------------
+ dtcm.bin 0x20000000 0x00020000
+ {
+ .ANY(+RW +ZI)
+ }
+
+ ;-----------------------------------------------------
+ ; 128kiB of stack space within the DTCM region
+ ;-----------------------------------------------------
+ ARM_LIB_STACK 0x20020000 EMPTY ALIGN 8 0x00020000
+ {}
+
+ ;-----------------------------------------------------
+ ; 256kiB of heap space within the DTCM region
+ ;-----------------------------------------------------
+
+ ARM_LIB_HEAP 0x20040000 EMPTY ALIGN 8 0x00040000
+ {}
+
+ ;-----------------------------------------------------
+ ; SSE-300's internal SRAM
+ ;-----------------------------------------------------
+ isram.bin 0x21000000 UNINIT ALIGN 16 0x00080000
+ {
+ ; activation buffers a.k.a tensor arena
+ *.o (.bss.NoInit.activation_buf)
+ }
+}
+
+;---------------------------------------------------------
+; Second load region
+;---------------------------------------------------------
+LOAD_REGION_1 0x60000000 0x02000000
+{
+ ;-----------------------------------------------------
+ ; 32 MiB of DRAM space for nn model and input vectors
+ ;-----------------------------------------------------
+ dram.bin 0x60000000 ALIGN 16 0x02000000
+ {
+ ; nn model's baked in input matrices
+ *.o (ifm)
+
+ ; nn model
+ *.o (nn_model)
+
+ ; if the activation buffer (tensor arena) doesn't
+ ; fit in the SRAM region, we accommodate it here
+ *.o (activation_buf)
+ }
+}
+```
+
+It is worth noting that in the bitfile implementation, only the BRAM,
+internal SRAM and DDR memory regions are accessible to the Ethos-U55
+block. In the above snippet, the internal SRAM region memory can be seen
+to be utilized by activation buffers with a limit of 512kiB. If used,
+this region will be written to by the Ethos-U55 block frequently. A bigger
+region of memory for storing the model is placed in the DDR region,
+under LOAD_REGION_1. The two load regions are necessary as the MPS3's
+motherboard configuration controller limits the load size at address
+0x00000000 to 512kiB. This has implications on how the application **is
+deployed** on MPS3 as explained under the section 3.8.3.
+
+## Automatic file generation
+
+As mentioned in the previous sections, some files such as neural network
+models, network's inputs, and output labels are automatically converted
+into C/C++ arrays during the CMake project configuration stage.
+Additionally, some code is generated to allow access to these arrays.
+
+An example:
+
+```log
+-- Building use-cases: img_class.
+-- Found sources for use-case img_class
+-- User option img_class_FILE_PATH is set to /tmp/samples
+-- User option img_class_IMAGE_SIZE is set to 224
+-- User option img_class_LABELS_TXT_FILE is set to /tmp/labels/labels_model.txt
+-- Generating image files from /tmp/samples
+++ Converting cat.bmp to cat.cc
+++ Converting dog.bmp to dog.cc
+-- Skipping file /tmp/samples/files.md due to unsupported image format.
+++ Converting kimono.bmp to kimono.cc
+++ Converting tiger.bmp to tiger.cc
+++ Generating /tmp/build/generated/img_class/include/InputFiles.hpp
+-- Generating labels file from /tmp/labels/labels_model.txt
+-- writing to /tmp/build/generated/img_class/include/Labels.hpp and /tmp/build/generated/img_class/src/Labels.cc
+-- User option img_class_ACTIVATION_BUF_SZ is set to 0x00200000
+-- User option img_class_MODEL_TFLITE_PATH is set to /tmp/models/model.tflite
+-- Using /tmp/models/model.tflite
+++ Converting model.tflite to model.tflite.cc
+...
+```
+
+In particular, the building options pointing to the input files `<use_case>_FILE_PATH`,
+the model `<use_case>_MODEL_TFLITE_PATH` and labels text file `<use_case>_LABELS_TXT_FILE`
+are used by python scripts in order to generate not only the converted array files,
+but also some headers with utility functions.
+
+For example, the generated utility functions for image classification are:
+
+- `build/generated/include/InputFiles.hpp`
+
+```c++
+#ifndef GENERATED_IMAGES_H
+#define GENERATED_IMAGES_H
+
+#include <cstdint>
+
+#define NUMBER_OF_FILES (2U)
+#define IMAGE_DATA_SIZE (150528U)
+
+extern const uint8_t im0[IMAGE_DATA_SIZE];
+extern const uint8_t im1[IMAGE_DATA_SIZE];
+
+const char* get_filename(const uint32_t idx);
+const uint8_t* get_img_array(const uint32_t idx);
+
+#endif /* GENERATED_IMAGES_H */
+```
+
+- `build/generated/src/InputFiles.cc`
+
+```c++
+#include "InputFiles.hpp"
+
+static const char *img_filenames[] = {
+ "img1.bmp",
+ "img2.bmp",
+};
+
+static const uint8_t *img_arrays[] = {
+ im0,
+ im1
+};
+
+const char* get_filename(const uint32_t idx)
+{
+ if (idx < NUMBER_OF_FILES) {
+ return img_filenames[idx];
+ }
+ return nullptr;
+}
+
+const uint8_t* get_img_array(const uint32_t idx)
+{
+ if (idx < NUMBER_OF_FILES) {
+ return img_arrays[idx];
+ }
+ return nullptr;
+}
+```
+
+These headers are generated using python templates, that are in `scripts/py/templates/*.template`.
+
+```tree
+scripts/
+├── cmake
+│ ├── ...
+│ ├── subsystem-profiles
+│ │ ├── corstone-sse-200.cmake
+│ │ └── corstone-sse-300.cmake
+│ ├── templates
+│ │ ├── mem_regions.h.template
+│ │ ├── peripheral_irqs.h.template
+│ │ └── peripheral_memmap.h.template
+│ └── ...
+└── py
+ ├── <generation scripts>
+ ├── requirements.txt
+ └── templates
+ ├── audio.cc.template
+ ├── AudioClips.cc.template
+ ├── AudioClips.hpp.template
+ ├── default.hpp.template
+ ├── header_template.txt
+ ├── image.cc.template
+ ├── Images.cc.template
+ ├── Images.hpp.template
+ ├── Labels.cc.template
+ ├── Labels.hpp.template
+ ├── testdata.cc.template
+ ├── TestData.cc.template
+ ├── TestData.hpp.template
+ └── tflite.cc.template
+```
+
+Based on the type of use case the correct conversion is called in the use case cmake file
+(audio or image respectively for voice or vision use cases).
+For example, the generations call for image classification (`source/use_case/img_class/usecase.cmake`):
+
+```c++
+# Generate input files
+generate_images_code("${${use_case}_FILE_PATH}"
+ ${SRC_GEN_DIR}
+ ${INC_GEN_DIR}
+ "${${use_case}_IMAGE_SIZE}")
+
+# Generate labels file
+set(${use_case}_LABELS_CPP_FILE Labels)
+generate_labels_code(
+ INPUT "${${use_case}_LABELS_TXT_FILE}"
+ DESTINATION_SRC ${SRC_GEN_DIR}
+ DESTINATION_HDR ${INC_GEN_DIR}
+ OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}"
+)
+
+...
+
+# Generate model file
+generate_tflite_code(
+ MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
+ DESTINATION ${SRC_GEN_DIR}
+)
+```
+
+> **Note:** When required, for models and labels conversion it's possible to add extra parameters such
+> as extra code to put in `<model>.cc` file or namespaces.
+>
+> ```c++
+> set(${use_case}_LABELS_CPP_FILE Labels)
+> generate_labels_code(
+> INPUT "${${use_case}_LABELS_TXT_FILE}"
+> DESTINATION_SRC ${SRC_GEN_DIR}
+> DESTINATION_HDR ${INC_GEN_DIR}
+> OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}"
+> NAMESPACE "namespace1" "namespace2"
+> )
+>
+> ...
+>
+> set(EXTRA_MODEL_CODE
+> "/* Model parameters for ${use_case} */"
+> "extern const int g_myvariable2 = value1"
+> "extern const int g_myvariable2 = value2"
+> )
+>
+> generate_tflite_code(
+> MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
+> DESTINATION ${SRC_GEN_DIR}
+> EXPRESSIONS ${EXTRA_MODEL_CODE}
+> NAMESPACE "namespace1" "namespace2"
+> )
+> ```
+
+In addition to input file conversions, the correct platform/system profile is selected
+(in `scripts/cmake/subsystem-profiles/*.cmake`) based on `TARGET_SUBSYSTEM` build option
+and the variables set are used to generate memory region sizes, base addresses and IRQ numbers,
+respectively used to generate mem_region.h, peripheral_irqs.h and peripheral_memmap.h headers.
+Templates from `scripts/cmake/templates/*.template` are used to generate the header files.
+
+After the build, the files generated in the build folder are:
+
+```tree
+build/generated/
+├── bsp
+│ ├── mem_regions.h
+│ ├── peripheral_irqs.h
+│ └── peripheral_memmap.h
+├── <use_case_name1>
+│ ├── include
+│ │ ├── InputFiles.hpp
+│ │ └── Labels.hpp
+│ └── src
+│ ├── <uc1_input_file1>.cc
+│ ├── <uc1_input_file2>.cc
+│ ├── InputFiles.cc
+│ ├── Labels.cc
+│ └── <uc1_model_name>.tflite.cc
+└── <use_case_name2>
+ ├── include
+ │ ├── InputFiles.hpp
+ │ └── Labels.hpp
+ └── src
+ ├── <uc2_input_file1>.cc
+ ├── <uc2_input_file2>.cc
+ ├── InputFiles.cc
+ ├── Labels.cc
+ └── <uc2_model_name>.tflite.cc
+```
+
+Next section of the documentation: [Deployment](../documentation.md#Deployment).
diff --git a/docs/sections/coding_guidelines.md b/docs/sections/coding_guidelines.md
new file mode 100644
index 0000000..f1813d3
--- /dev/null
+++ b/docs/sections/coding_guidelines.md
@@ -0,0 +1,323 @@
+# Coding standards and guidelines
+
+## Contents
+
+- [Introduction](#introduction)
+- [Language version](#language-version)
+- [File naming](#file-naming)
+- [File layout](#file-layout)
+- [Block Management](#block-management)
+- [Naming Conventions](#naming-conventions)
+ - [C++ language naming conventions](#c_language-naming-conventions)
+ - [C language naming conventions](#c-language-naming-conventions)
+- [Layout and formatting conventions](#layout-and-formatting-conventions)
+- [Language usage](#language-usage)
+
+## Introduction
+
+This document presents some standard coding guidelines to be followed for contributions to this repository. Most of the
+code is written in C++, but there is some written in C as well. There is a clear C/C++ boundary at the Hardware
+Abstraction Layer (HAL). Both these languages follow different naming conventions within this repository, by design, to:
+
+- have clearly distinguishable C and C++ sources.
+- make cross language function calls stand out. Mostly these will be C++ function calls to the HAL functions written in C.
+However, because we also issue function calls to third party API's (and they may not follow these conventions), the
+intended outcome may not be fully realised in all of the cases.
+
+## Language version
+
+For this project, code written in C++ shall use a subset of the C++11 feature set and software
+may be written using the C++11 language standard. Code written in C should be compatible
+with the C99 standard.
+
+Software components written in C/C++ may use the language features allowed and encouraged by this documentation.
+
+## File naming
+
+- C files should have `.c` extension
+- C++ files should have `.cc` or `.cpp` extension.
+- Header files for functions implemented in C should have `.h` extension.
+- Header files for functions implemented in C++ should have `.hpp` extension.
+
+## File layout
+
+- Standard copyright notice must be included in all files:
+
+ ```copyright
+ /*
+ * Copyright (c) <years additions were made to project> <your name>, Arm Limited. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ ```
+
+- Source lines must be no longer than 120 characters. Prefer to spread code out vertically rather than horizontally,
+ wherever it makes sense:
+
+ ```C++
+ # This is significantly easier to read
+ enum class SomeEnum1
+ {
+ ENUM_VALUE_1,
+ ENUM_VALUE_2,
+ ENUM_VALUE_3
+ };
+
+ # than this
+ enum class SomeEnum2 { ENUM_VALUE_1, ENUM_VALUE_2, ENUM_VALUE_3 };
+ ```
+
+- Block indentation should use 4 characters, no tabs.
+
+- Each statement must be on a separate line.
+
+ ```C++
+ int a, b; // Error prone
+ int c, *d;
+
+ int e = 0; // GOOD
+ int *p = nullptr; // GOOD
+ ```
+
+- Source must not contain commented out code or unreachable code
+
+## Block Management
+
+- Blocks must use braces and braces location must be consistent.
+ - Each function has its opening brace at the next line on the same indentation level as its header, the code within
+ the braces is indented and the closing brace at the end is on the same level as the opening.
+ For compactness, if the class/function body is empty braces are accepted on the same line.
+
+ - Conditional statements and loops, even if are just single-statement body, needs to be surrounded by braces, the
+opening brace is at the same line, the closing brace is at the next line on the same indentation level as its header;
+the same rule is applied to classes.
+
+ ```C++
+ class Class1 {
+ public:
+ Class1();
+ private:
+ int element;
+ };
+
+ void NotEmptyFunction()
+ {
+ if (condition) {
+ // [...]
+ } else {
+ // [...]
+ }
+ // [...]
+ for(start_cond; end_cond; step_cond) {
+ // [...]
+ }
+ }
+
+ void EmptyFunction() {}
+ ```
+
+ - Cases within switch are indented and enclosed in brackets:
+
+ ```C++
+ switch (option)
+ {
+ case 1:
+ {
+ // handle option 1
+ break;
+ }
+ case 2:
+ {
+ // handle option 2
+ break;
+ }
+ default:
+ {
+ break;
+ }
+ }
+ ```
+
+## Naming Conventions
+
+### C++ language naming conventions
+
+- Type (class, struct, enum) names must be `PascalCase`:
+
+ ```C++
+ class SomeClass
+ {
+ // [...]
+ };
+ void SomeFunction()
+ {
+ // [...]
+ }
+ ```
+
+- Variables and parameter names must be `camelCase`:
+
+ ```C++
+ int someVariable;
+
+ void SomeFunction(int someParameter) {}
+ ```
+
+- Macros, pre-processor definitions, and enumeration values should use upper case names:
+
+ ```C++
+ #define SOME_DEFINE
+
+ enum class SomeEnum
+ {
+ ENUM_VALUE_1,
+ ENUM_VALUE_2
+ };
+ ```
+
+- Namespace names must be lower case
+
+ ```C++
+ namespace nspace
+ {
+ void FunctionInNamespace();
+ };
+ ```
+
+- Source code should use Hungarian notation to annotate the name of a variable with information about its meaning.
+
+ | Prefix | Class | Description |
+ | ------ | ----- | ----------- |
+ | p | Type | Pointer to any other type |
+ | k | Qualifier | Constant |
+ | v | Qualifier | Volatile |
+ | m | Scope | Member of a class or struct |
+ | s | Scope | Static |
+ | g | Scope | Used to indicate variable has scope beyond the current function: file-scope or externally visible scope|
+
+The following examples of Hungarian notation are one possible set of uses:
+
+ ```C++
+ int g_GlobalInt=123;
+ char* m_pNameOfMemberPointer=nullptr;
+ const float g_kSomeGlobalConstant = 1.234f;
+ static float ms_MyStaticMember = 4.321f;
+ bool myLocalVariable=true;
+ ```
+
+### C language naming conventions
+
+For C sources, we follow the Linux variant of the K&R style wherever possible.
+
+- For function and variable names we use `snake_case` convention:
+
+ ```C
+ int some_variable;
+
+ void some_function(int some_parameter) {}
+ ```
+
+- Macros, pre-processor definitions, and enumeration values should use upper case names:
+
+ ```C
+ #define SOME_DEFINE
+
+ enum some_enum
+ {
+ ENUM_VALUE_1,
+ ENUM_VALUE_2
+ };
+ ```
+
+## Layout and formatting conventions
+
+- C++ class code layout
+ Public function definitions should be at the top of a class definition, since they are things most likely to be used
+by other people.
+ Private functions and member variables should be last.
+ Class functions and member variables should be laid out logically in blocks of related functionality.
+
+- Class inheritance keywords are not indented.
+
+ ```C++
+ class MyClass
+ {
+ public:
+ int m_PublicMember;
+ protected:
+ int m_ProtectedMember;
+ private:
+ int m_PrivateMember;
+ };
+ ```
+
+- Don't leave trailing spaces at the end of lines.
+
+- Empty lines should have no trailing spaces.
+
+- For pointers and references, the symbols `*` and `&` should be adjacent to the name of the type, not the name
+ of the variable.
+
+ ```C++
+ char* someText = "abc";
+
+ void SomeFunction(const SomeObject& someObject) {}
+ ```
+
+## Language usage
+
+- Header `#include` statements should be minimized.
+ Inclusion of unnecessary headers slows down compilation, and can hide errors where a function calls a
+ subroutine which it should not be using if the unnecessary header defining this subroutine is included.
+
+ Header statements should be included in the following order:
+
+ - Header file corresponding to the current source file (if applicable)
+ - Headers from the same component
+ - Headers from other components
+ - Third-party headers
+ - System headers
+
+ > **Note:** Leave one blank line between each of these groups for readability.
+ >Use quotes for headers from within the same project and angle brackets for third-party and system headers.
+ >Do not use paths relative to the current source file, such as `../Header.hpp`. Instead configure your include paths
+>in the project makefiles.
+
+ ```C++
+ #include "ExampleClass.hpp" // Own header
+
+ #include "Header1.hpp" // Header from same component
+ #include "Header1.hpp" // Header from same component
+
+ #include "other/Header3.hpp" // Header from other component
+
+ #include <ThirdParty.hpp> // Third-party headers
+
+ #include <vector> // System header
+
+ // [...]
+ ```
+
+- C++ casts should use the template-styled case syntax
+
+ ```C++
+ int a = 100;
+ float b = (float)a; // Not OK
+ float c = static_cast<float>(a); // OK
+ ```
+
+- Use the const keyword to declare constants instead of define.
+
+- Should use `nullptr` instead of `NULL`,
+ C++11 introduced the `nullptr` type to distinguish null pointer constants from the integer 0.
diff --git a/docs/sections/customizing.md b/docs/sections/customizing.md
new file mode 100644
index 0000000..e92c327
--- /dev/null
+++ b/docs/sections/customizing.md
@@ -0,0 +1,731 @@
+# Implementing custom ML application
+
+- [Software project description](#software-project-description)
+- [HAL API](#hal-api)
+- [Main loop function](#main-loop-function)
+- [Application context](#application-context)
+- [Profiler](#profiler)
+- [NN Model API](#nn-model-api)
+- [Adding custom ML use case](#adding-custom-ml-use-case)
+- [Implementing main loop](#implementing-main-loop)
+- [Implementing custom NN model](#implementing-custom-nn-model)
+- [Executing inference](#executing-inference)
+- [Printing to console](#printing-to-console)
+- [Reading user input from console](#reading-user-input-from-console)
+- [Output to MPS3 LCD](#output-to-mps3-lcd)
+- [Building custom use case](#building-custom-use-case)
+
+This section describes how to implement a custom Machine Learning
+application running on Fast Model FVP or on the Arm MPS3 FPGA prototyping board.
+
+Arm® Ethos™-U55 code sample software project offers a simple way to incorporate
+additional use-case code into the existing infrastructure and provides a build
+system that automatically picks up added functionality and produces corresponding
+executable for each use-case. This is achieved by following certain configuration
+and code implementation conventions.
+
+The following sign will indicate the important conventions to apply:
+
+> **Convention:** The code is developed using C++11 and C99 standards.
+This is governed by TensorFlow Lite for Microcontrollers framework.
+
+## Software project description
+
+As mentioned in the [Repository structure](../documentation.md#repository-structure) section, project sources are:
+
+```tree
+├── docs
+│ ├── ...
+│ └── Documentation.md
+├── resources
+│ └── img_class
+│ └── ...
+├── scripts
+│ └── ...
+├── source
+│ ├── application
+│ │ ├── hal
+│ │ ├── main
+│ │ └── tensorflow-lite-micro
+│ └── use_case
+│ └──img_class
+├── CMakeLists.txt
+└── Readme.md
+```
+
+Where `source` contains C/C++ sources for the platform and ML applications.
+Common code related to the Ethos-U55 code samples software
+framework resides in the *application* sub-folder and ML application specific logic (use-cases)
+sources are in the *use-case* subfolder.
+
+> **Convention**: Separate use-cases must be organized in sub-folders under the use-case folder.
+The name of the directory is used as a name for this use-case and could be provided
+as a `USE_CASE_BUILD` parameter value.
+It is expected by the build system that sources for the use-case are structured as follows:
+headers in an include directory, C/C++ sources in a src directory.
+For example:
+>
+>```tree
+>use_case
+> └──img_class
+> ├── include
+> │ └── *.hpp
+> └── src
+> └── *.cc
+>```
+
+## HAL API
+
+Hardware abstraction layer is represented by the following interfaces.
+To access them, include hal.h header.
+
+- *hal_platfrom* structure:\
+ Structure that defines a platform context to be used by the application
+
+ | Attribute name | Description |
+ |--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+ | inited | Initialization flag. Is set after the platfrom_init() function is called. |
+ | plat_name | Platform name. it is set to "mps3-bare" for MPS3 build and "FVP" for Fast Model build. |
+ | data_acq | Pointer to data acquisition module responsible for user interaction and other data collection for the application logic. |
+ | data_psn | Pointer to data presentation module responsible for data output through components available in the selected platform: LCD -- for MPS3, console -- for Fast Model. |
+ | timer | Pointer to platform timer implementation (see platform_timer) |
+ | platform_init | Pointer to platform initialization function. |
+ | platform_release | Pointer to platform release function |
+
+- *hal_init* function:\
+ Initializes the HAL structure based on compile time config. This
+ should be called before any other function in this API.
+
+ | Parameter name | Description|
+ |------------------|-----------------------------------------------------|
+ | platform | Pointer to a pre-allocated *hal_platfrom* struct. |
+ | data_acq | Pointer to a pre-allocated data acquisition module |
+ | data_psn | Pointer to a pre-allocated data presentation module |
+ | timer | Pointer to a pre-allocated timer module |
+ | return | zero if successful, error code otherwise |
+
+- *hal_platform_init* function:\
+ Initializes the HAL platform and all the modules on the platform the
+ application requires to run.
+
+ | Parameter name | Description |
+ | ----------------| ------------------------------------------------------------------- |
+ | platform | Pointer to a pre-allocated and initialized *hal_platfrom* struct. |
+ | return | zero if successful, error code otherwise. |
+
+- *hal_platform_release* function\
+ Releases the HAL platform. This should release resources acquired.
+
+ | Parameter name | Description |
+ | ----------------| ------------------------------------------------------------------- |
+ | platform | Pointer to a pre-allocated and initialized *hal_platfrom* struct. |
+
+- *data_acq_module* structure:\
+ Structure to encompass the data acquisition module and it's
+ methods.
+
+ | Attribute name | Description |
+ |----------------|----------------------------------------------------|
+ | inited | Initialization flag. Is set after the system_init () function is called. |
+ | system_name | Channel name. It is set to "UART" for MPS3 build and fastmodel builds. |
+ | system_init | Pointer to data acquisition module initialization function. The pointer is set according to the platform selected during the build. This function is called by the platforminitialization routines. |
+ | get_input | Pointer to a function reading user input. The pointer is set according to the selected platform during the build. For MPS3 and fastmodel environments, the function reads data from UART. |
+
+- *data_psn_module* structure:\
+ Structure to encompass the data presentation module and its methods.
+
+ | Attribute name | Description |
+ |--------------------|------------------------------------------------|
+ | inited | Initialization flag. It is set after the system_init () function is called. |
+ | system_name | System component name used to present data. It is set to "lcd" for MPS3 build and to "log_psn" for fastmodel build. In case of fastmodel, all pixel drawing functions are replaced by console output of the data summary. |
+ | system_init | Pointer to data presentation module initialization function. The pointer is set according to the platform selected during the build. This function is called by the platform initialization routines. |
+ | present_data_image | Pointer to a function to draw an image. The pointer is set according to the selected platform during the build. For MPS3, the image will be drawn on the LCD; for fastmodel image summary will be printed in the UART (coordinates, channel info, downsample factor) |
+ | present_data_text | Pointer to a function to print a text. The pointer is set according to the selected platform during the build. For MPS3, the text will be drawn on the LCD; for fastmodel text will be printed in the UART. |
+ | present_box | Pointer to a function to draw a rectangle. The pointer is set according to the selected platform during the build. For MPS3, the image will be drawn on the LCD; for fastmodel image summary will be printed in the UART. |
+ | clear | Pointer to a function to clear the output. The pointer is set according to the selected platform during the build. For MPS3, the function will clear the LCD; for fastmodel will do nothing. |
+ | set_text_color | Pointer to a function to set text color for the next call of present_data_text() function. The pointer is set according to the selected platform during the build. For MPS3, the function will set the color for the text printed on the LCD; for fastmodel -- will do nothing. |
+ | set_led | Pointer to a function controlling an LED (led_num) with on/off |
+
+- *platform_timer* structure:\
+ Structure to hold a platform specific timer implementation.
+
+ | Attribute name | Description |
+ |--------------------|------------------------------------------------|
+ | inited | Initialization flag. It is set after the timer is initialized by the *hal_platform_init* function. |
+ | reset | Pointer to a function to reset a timer. |
+ | get_time_counter | Pointer to a function to get current time counter. |
+ | get_duration_ms | Pointer to a function to calculate duration between two time-counters in milliseconds. |
+ | get_duration_us | Pointer to a function to calculate duration between two time-counters in microseconds |
+ | get_npu_cycle_diff | Pointer to a function to calculate duration between two time-counters in Ethos-U55 cycles. Available only when project is configured with ETHOS_U55_ENABLED set. |
+
+Example of the API initialization in the main function:
+
+```c++
+#include "hal.h"
+
+int main ()
+
+{
+
+ hal_platform platform;
+ data_acq_module dataAcq;
+ data_psn_module dataPsn;
+ platform_timer timer;
+
+ /* Initialise the HAL and platform */
+ hal_init(&platform, &dataAcq, &dataPsn, &timer);
+ hal_platform_init(&platform);
+
+ ...
+
+ hal_platform_release(&platform);
+
+ return 0;
+
+}
+```
+
+## Main loop function
+
+Code samples application main function will delegate the use-case
+logic execution to the main loop function that must be implemented for
+each custom ML scenario.
+
+Main loop function takes the initialized *hal_platform* structure
+pointer as an argument.
+
+The main loop function has external linkage and main executable for the
+use-case will have reference to the function defined in the use-case
+code.
+
+```c++
+void main_loop(hal_platform& platform){
+
+...
+
+}
+```
+
+## Application context
+
+Application context could be used as a holder for a state between main
+loop iterations. Include AppContext.hpp to use ApplicationContext class.
+
+| Method name | Description |
+|--------------|-----------------------------------------------------------------|
+| Set | Saves given value as a named attribute in the context. |
+| Get | Gets the saved attribute from the context by the given name. |
+| Has | Checks if an attribute with a given name exists in the context. |
+
+For example:
+
+```c++
+#include "hal.h"
+#include "AppContext.hpp"
+
+void main_loop(hal_platform& platform) {
+
+ /* Instantiate application context */
+ arm::app::ApplicationContext caseContext;
+ caseContext.Set<hal_platform&>("platform", platform);
+ caseContext.Set<uint32_t>("counter", 0);
+
+ /* loop */
+ while (true) {
+ // do something, pass application context down the call stack
+ }
+}
+```
+
+## Profiler
+
+Profiler is a helper class assisting in collection of timings and
+Ethos-U55 cycle counts for operations. It uses platform timer to get
+system timing information.
+
+| Method name | Description |
+|----------------------|-----------------------------------------------------------|
+| StartProfiling | Starts profiling and records the starting timing data. |
+| StopProfiling | Stops profiling and records the ending timing data. |
+| Reset | Resets the profiler and clears all collected data. |
+| GetResultsAndReset | Gets the results as string and resets the profiler. |
+
+Usage example:
+
+```c++
+Profiler profiler{&platform, "Inference"};
+
+profiler.StartProfiling();
+// Code running inference to profile
+profiler.StopProfiling();
+
+info("%s\n", profiler.GetResultsAndReset().c_str());
+```
+
+## NN Model API
+
+Model (refers to neural network model) is an abstract class wrapping the
+underlying TensorFlow Lite Micro API and providing methods to perform
+common operations such as TensorFlow Lite Micro framework
+initialization, inference execution, accessing input and output tensor
+objects.
+
+To use this abstraction, import TensorFlowLiteMicro.hpp header.
+
+| Method name | Description |
+|--------------------------|------------------------------------------------------------------------------|
+| GetInputTensor | Returns the pointer to the model\'s input tensor. |
+| GetOutputTensor | Returns the pointer to the model\'s output tensor |
+| GetType | Returns the model's data type |
+| GetInputShape | Return the pointer to the model\'s input shape |
+| GetOutputShape | Return the pointer to the model\'s output shape |
+| LogTensorInfo | Logs the tensor information to stdout for the given tensor pointer: tensor name, tensor address, tensor type, tensor memory size and quantization params. |
+| LogInterpreterInfo | Logs the interpreter information to stdout. |
+| Init | Initializes the TensorFlow Lite Micro framework, allocates require memory for the model. |
+| IsInited | Checks if this model object has been initialized. |
+| IsDataSigned | Checks if the model uses signed data type. |
+| RunInference | Runs the inference (invokes the interpreter). |
+| GetOpResolver() | Returns the reference to the TensorFlow Lite Micro operator resolver. |
+| EnlistOperations | Registers required operators with TensorFlow Lite Micro operator resolver. |
+| GetTensorArena | Returns pointer to memory region to be used for tensors allocations. |
+| GetActivationBufferSize | Returns the size of the tensor arena memory region. |
+
+> **Convention**: Each ML use-case must have extension of this class and implementation of the protected virtual methods:
+>
+>```c++
+>virtual const tflite::MicroOpResolver& GetOpResolver() = 0;
+>virtual bool EnlistOperations() = 0;
+>virtual uint8_t* GetTensorArena() = 0;
+>virtual size_t GetActivationBufferSize() = 0;
+>```
+>
+>Network models have different set of operators that must be registered with
+tflite::MicroMutableOpResolver object in the EnlistOperations method.
+Network models could require different size of activation buffer that is returned as
+tensor arena memory for TensorFlow Lite Micro framework by the GetTensorArena
+and GetActivationBufferSize methods.
+
+Please see MobileNetModel.hpp and MobileNetModel.cc files from image
+classification ML application use-case as an example of the model base
+class extension.
+
+## Adding custom ML use case
+
+This section describes how to implement additional use-case and compile
+it into the binary executable to run with Fast Model or MPS3 FPGA board.
+It covers common major steps: application main loop creation,
+description of the NN model, inference execution.
+
+In addition, few useful examples are provided: reading user input,
+printing into console, drawing images into MPS3 LCD.
+
+```tree
+use_case
+ └──hello_world
+ ├── include
+ └── src
+```
+
+Start with creation of a sub-directory under the *use_case* directory and
+two other directories *src* and *include* as described in
+[Software project description](#software-project-description) section:
+
+## Implementing main loop
+
+Use-case main loop is the place to put use-case main logic. Essentially,
+it is an infinite loop that reacts on user input, triggers use-case
+conditional logic based on the input and present results back to the
+user. However, it could also be a simple logic that runs a single inference
+and then exits.
+
+Main loop has knowledge about the platform and has access to the
+platform components through the hardware abstraction layer (referred to as HAL).
+
+Create a *MainLoop.cc* file in the *src* directory (the one created under
+[Adding custom ML use case](#adding-custom-ml-use-case)), the name is not
+important. Define *main_loop* function with the signature described in
+[Main loop function](#main-loop-function):
+
+```c++
+#include "hal.h"
+
+void main_loop(hal_platform& platform) {
+ printf("Hello world!");
+}
+```
+
+The above is already a working use-case, if you compile and run it (see
+[Building custom usecase](#Building-custom-use-case)) the application will start, print
+message to console and exit straight away.
+
+Now, you can start filling this function with logic.
+
+## Implementing custom NN model
+
+Before inference could be run with a custom NN model, TensorFlow Lite
+Micro framework must learn about the operators/layers included in the
+model. Developer must register operators using *MicroMutableOpResolver*
+API.
+
+Ethos-U55 code samples project has an abstraction around TensorFlow
+Lite Micro API (see [NN model API](#nn-model-api)). Create *HelloWorld.hpp* in
+the use-case include sub-directory, extend Model abstract class and
+declare required methods.
+
+For example:
+
+```c++
+#include "Model.hpp"
+
+namespace arm {
+namespace app {
+
+class HelloWorldModel: public Model {
+ protected:
+ /** @brief Gets the reference to op resolver interface class. */
+ const tflite::MicroOpResolver& GetOpResolver() override;
+
+ /** @brief Adds operations to the op resolver instance. */
+ bool EnlistOperations() override;
+
+ const uint8_t* ModelPointer() override;
+
+ size_t ModelSize() override;
+
+ private:
+ /* Maximum number of individual operations that can be enlisted. */
+ static constexpr int _m_maxOpCnt = 5;
+
+ /* A mutable op resolver instance. */
+ tflite::MicroMutableOpResolver<_maxOpCnt> _m_opResolver;
+ };
+} /* namespace app */
+} /* namespace arm */
+```
+
+Create `HelloWorld.cc` file in the `src` sub-directory and define the methods
+there. Include `HelloWorldModel.hpp` created earlier. Note that `Model.hpp`
+included in the header provides access to TensorFlow Lite Micro's operation
+resolver API.
+
+Please, see `use_case/image_classifiaction/src/MobileNetModel.cc` for
+code examples.\
+If you are using a TensorFlow Lite model compiled with Vela, it is important to add
+custom Ethos-U55 operator to the operators list.
+
+The following example shows how to add the custom Ethos-U55 operator with
+TensorFlow Lite Micro framework. We will use the ARM_NPU define to exclude
+the code if the application was built without NPU support.
+
+```c++
+#include "HelloWorldModel.hpp"
+
+bool arm::app::HelloWorldModel::EnlistOperations() {
+
+ #if defined(ARM_NPU)
+ if (kTfLiteOk == this->_opResolver.AddEthosU()) {
+ info("Added %s support to op resolver\n",
+ tflite::GetString_ETHOSU());
+ } else {
+ printf_err("Failed to add Arm NPU support to op resolver.");
+ return false;
+ }
+ #endif /* ARM_NPU */
+
+ return true;
+}
+```
+
+To minimize application memory footprint, it is advised to register only
+operators used by the NN model.
+
+Define `ModelPointer` and `ModelSize` methods. These functions are wrappers around the
+functions generated in the C++ file containing the neural network model as an array.
+This generation the C++ array from the .tflite file, logic needs to be defined in
+the `usecase.cmake` file for this `HelloWorld` example.
+
+For more details on `usecase.cmake`, see [Building custom use case](#building-custom-use-case).
+For details on code generation flow in general, see [Automatic file generation](./building.md#Automatic-file-generation)
+
+The TensorFlow Lite model data is read during Model::init() method execution, see
+*application/tensorflow-lite-micro/Model.cc* for more details. Model invokes
+`ModelPointer()` function which calls the `GetModelPointer()` function to get
+neural network model data memory address. The `GetModelPointer()` function
+will be generated during the build and could be found in the
+file `build/generated/hello_world/src/<model_file_name>.cc`. Generated
+file is added to the compilation automatically.
+
+Use \${use-case}_MODEL_TFLITE_PATH build parameter to include custom
+model to the generation/compilation process (see [Build options](./building.md/#build-options)).
+
+## Executing inference
+
+To run an inference successfully it is required to have:
+
+- a TensorFlow Lite model file
+- extended Model class
+- place to add the code to invoke inference
+- main loop function
+- and some input data.
+
+For the hello_world example below, the input array is not populated.
+However, for real-world scenarios, this data should either be read from
+an on-board device or be prepared in the form of C++ sources before
+compilation and be baked into the application.
+
+For example, the image classification application has extra build steps
+to generate C++ sources from the provided images with
+*generate_images_code* CMake function.
+
+> **Note:**
+Check the input data type for your NN model and input array data type are the same.
+For example, generated C++ sources for images store image data as uint8 array. For models that were
+quantized to int8 data type, it is important to convert image data to int8 correctly before inference execution.
+Asymmetric data type to symmetric data type conversion involves positioning zero value, i.e. subtracting an
+offset for uint8 values. Please check image classification application source for the code example
+(ConvertImgToInt8 function).
+
+The following code adds inference invocation to the main loop function:
+
+```c++
+#include "hal.h"
+#include "HelloWorldModel.hpp"
+
+ void main_loop(hal_platform& platform) {
+
+ /* model wrapper object */
+ arm::app::HelloWorldModel model;
+
+ /* Load the model */
+ if (!model.Init()) {
+ printf_err("failed to initialise model\n");
+ return;
+ }
+
+ TfLiteTensor *outputTensor = model.GetOutputTensor();
+ TfLiteTensor *inputTensor = model.GetInputTensor();
+
+ /* dummy input data*/
+ uint8_t inputData[1000];
+
+ memcpy(inputTensor->data.data, inputData, 1000);
+
+ /* run inference */
+ model.RunInference();
+
+ const uint32_t tensorSz = outputTensor->bytes;
+ const uint8_t * outputData = tflite::GetTensorData<uint8>(outputTensor);
+}
+```
+
+The code snippet has several important blocks:
+
+- Creating HelloWorldModel object and initializing it.
+
+ ```c++
+ arm::app::HelloWorldModel model;
+
+ /* Load the model */
+ if (!model.Init()) {
+ printf_err(\"failed to initialise model\\n\");
+ return;
+ }
+ ```
+
+- Getting pointers to allocated input and output tensors.
+
+ ```c++
+ TfLiteTensor *outputTensor = model.GetOutputTensor();
+ TfLiteTensor *inputTensor = model.GetInputTensor();
+ ```
+
+- Copying input data to the input tensor. We assume input tensor size
+ to be 1000 uint8 elements.
+
+ ```c++
+ memcpy(inputTensor->data.data, inputData, 1000);
+ ```
+
+- Running inference
+
+ ```c++
+ model.RunInference();
+ ```
+
+- Reading inference results: data and data size from the output
+ tensor. We assume that output layer has uint8 data type.
+
+ ```c++
+ Const uint32_t tensorSz = outputTensor->bytes ;
+
+ const uint8_t *outputData = tflite::GetTensorData<uint8>(outputTensor);
+ ```
+
+Adding profiling for Ethos-U55 is easy. Include `Profiler.hpp` header and
+invoke `StartProfiling` and `StopProfiling` around inference
+execution.
+
+```c++
+Profiler profiler{&platform, "Inference"};
+
+profiler.StartProfiling();
+model.RunInference();
+profiler.StopProfiling();
+std::string profileResults = profiler.GetResultsAndReset();
+
+info("%s\n", profileResults.c_str());
+```
+
+## Printing to console
+
+Provided examples already used some function to print messages to the
+console. The full list of available functions:
+
+- `printf`
+- `trace` - printf wrapper for tracing messages
+- `debug` - printf wrapper for debug messages
+- `info` - printf wrapper for informational messages
+- `warn` - printf wrapper for warning messages
+- `printf_err` - printf wrapper for error messages
+
+`printf` wrappers could be switched off with `LOG_LEVEL` define:
+
+trace (0) < debug (1) < info (2) < warn (3) < error (4).
+
+Default output level is info = level 2.
+
+## Reading user input from console
+
+Platform data acquisition module has get_input function to read keyboard
+input from the UART. It can be used as follows:
+
+```c++
+char ch_input[128];
+platform.data_acq->get_input(ch_input, sizeof(ch_input));
+```
+
+The function will block until user provides an input.
+
+## Output to MPS3 LCD
+
+Platform presentation module has functions to print text or an image to
+the board LCD:
+
+- `present_data_text`
+- `present_data_image`
+
+Text presentation function has the following signature:
+
+- `const char* str`: string to print.
+- `const uint32_t str_sz`: string size.
+- `const uint32_t pos_x`: x coordinate of the first letter in pixels.
+- `const uint32_t pos_y`: y coordinate of the first letter in pixels.
+- `const uint32_t alow_multiple_lines`: signals whether the text is
+ allowed to span multiple lines on the screen, or should be truncated
+ to the current line.
+
+This function does not wrap text, if the given string cannot fit on the
+screen it will go outside the screen boundary.
+
+Example that prints "Hello world" on the LCD:
+
+```c++
+std::string hello("Hello world");
+platform.data_psn->present_data_text(hello.c_str(), hello.size(), 10, 35, 0);
+```
+
+Image presentation function has the following signature:
+
+- `uint8_t* data`: image data pointer;
+- `const uint32_t width`: image width;
+- `const uint32_t height`: image height;
+- `const uint32_t channels`: number of channels. Only 1 and 3 channels are supported now.
+- `const uint32_t pos_x`: x coordinate of the first pixel.
+- `const uint32_t pos_y`: y coordinate of the first pixel.
+- `const uint32_t downsample_factor`: the factor by which the image is to be down sampled.
+
+For example, the following code snippet visualizes an input tensor data
+for MobileNet v2 224 (down sampling it twice):
+
+```c++
+platform.data_psn->present_data_image((uint8_t *) inputTensor->data.data, 224, 224, 3, 10, 35, 2);
+```
+
+Please see [hal-api](#hal-api) section for other data presentation
+functions.
+
+## Building custom use case
+
+There is one last thing to do before building and running a use-case
+application: create a `usecase.cmake` file in the root of your use-case,
+the name of the file is not important.
+
+> **Convention:** The build system searches for CMake file in each use-case directory and includes it into the build
+> flow. This file could be used to specify additional application specific build options, add custom build steps or
+> override standard compilation and linking flags.
+> Use `USER_OPTION` function to add additional build option. Prefix variable name with `${use_case}` (use-case name) to
+> avoid names collisions with other CMake variables.
+> Some useful variable names visible in use-case CMake file:
+>
+> - `DEFAULT_MODEL_PATH` – default model path to use if use-case specific `${use_case}_MODEL_TFLITE_PATH` is not set
+>in the build arguments.
+>- `TARGET_NAME` – name of the executable.
+> - `use_case` – name of the current use-case.
+> - `UC_SRC` – list of use-case sources.
+> - `UC_INCLUDE` – path to the use-case headers.
+> - `ETHOS_U55_ENABLED` – flag indicating if the current build supports Ethos-U55.
+> - `TARGET_PLATFORM` – Target platform being built for.
+> - `TARGET_SUBSYSTEM` – If target platform supports multiple subsystems, this is the name of the subsystem.
+> - All standard build options.
+> - `CMAKE_CXX_FLAGS` and `CMAKE_C_FLAGS` – compilation flags.
+> - `CMAKE_EXE_LINKER_FLAGS` – linker flags.
+
+For the hello world use-case it will be enough to create
+`helloworld.cmake` file and set DEFAULT_MODEL_PATH:
+
+```cmake
+if (ETHOS_U55_ENABLED EQUAL 1)
+ set(DEFAULT_MODEL_PATH ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8_vela.tflite)
+else()
+ set(DEFAULT_MODEL_PATH ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8.tflite)
+endif()
+```
+
+This can be used in subsequent section, for example:
+
+```cmake
+USER_OPTION(${use_case}_MODEL_TFLITE_PATH "Neural network model in tflite format."
+ ${DEFAULT_MODEL_PATH}
+ FILEPATH
+ )
+
+# Generate model file
+generate_tflite_code(
+ MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH}
+ DESTINATION ${SRC_GEN_DIR}
+ )
+```
+
+This ensures that the model path pointed by `${use_case}_MODEL_TFLITE_PATH` is converted to a C++ array and is picked
+up by the build system. More information on auto-generations is available under section
+[Automatic file generation](./building.md#Automatic-file-generation).
+
+To build you application follow the general instructions from
+[Add Custom inputs](#add-custom-inputs) and specify the name of the use-case in the
+build command:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DUSE_CASE_BUILD=hello_world \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+As a result, `ethos-u-hello_world.axf` should be created, MPS3 build
+will also produce `sectors/hello_world` directory with binaries and
+`images-hello_world.txt` to be copied to the board MicroSD card.
+
+Next section of the documentation: [Testing and benchmarking](../documentation.md#Testing-and-benchmarking).
diff --git a/docs/sections/deployment.md b/docs/sections/deployment.md
new file mode 100644
index 0000000..354d30b
--- /dev/null
+++ b/docs/sections/deployment.md
@@ -0,0 +1,281 @@
+# Deployment
+
+- [Fixed Virtual Platform](#fixed-virtual-platform)
+ - [Setting up the MPS3 Arm Corstone-300 FVP](#setting-up-the-mps3-arm-corstone-300-fvp)
+ - [Deploying on an FVP emulating MPS3](#deploying-on-an-fvp-emulating-mps3)
+- [MPS3 board](#mps3-board)
+ - [Deployment on MPS3 board](#deployment-on-mps3-board)
+
+The sample application for Arm® Ethos™-U55 can be deployed on two
+target platforms, both of which implement the Arm® Corstone™-300 design (see
+<https://www.arm.com/products/iot/soc/corstone-300>):
+
+- A physical Arm MPS3 FPGA prototyping board
+
+- An MPS3 FVP
+
+## Fixed Virtual Platform
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads
+](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+Download the correct archive from the list under `Arm Corstone-300`. We need the one which:
+
+- Emulates MPS3 board (not for MPS2 FPGA board)
+- Contains support for Arm® Ethos™-U55
+
+> **Note:** Currently, the FVP only has a Linux OS version. Also, there are no FVPs available for `SSE-200`
+> which satisfy the above conditions.
+
+For FVP, the elf or the axf file can be run using the Fast Model
+executable as outlined under the [Starting Fast Model simulation](./setup.md/#starting-fast-model-simulation)
+except for the binary being pointed at here
+is the one just built using the steps in the previous section.
+
+### Setting up the MPS3 Arm Corstone-300 FVP
+
+For Ethos-U55 sample application, please download the MPS3 version of the
+Arm® Corstone™-300 model that contains Ethos-U55 and Arm® Cortex®-M55. The model is
+currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+ `./FVP_Corstone_SSE-300_Ethos-U55.sh`
+
+- Follow the instructions to install the FVP to your desired location
+
+### Deploying on an FVP emulating MPS3
+
+This section assumes that the FVP has been installed (see [Setting up the MPS3 Arm Corstone-300 FVP](#Setting-up-the-MPS3-Arm-Corstone-300-FVP)) to the user's home directory `~/FVP_Corstone_SSE-300_Ethos-U55`.
+
+The installation, typically, will have the executable under `~/FVP_Corstone_SSE-300_Ethos-U55/model/<OS>_<compiler-version>/`
+directory. For the example below, we assume it to be `~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4`.
+
+To run a use case on the FVP, from the [Build directory](../sections/building.md#Create-a-build-directory):
+
+```commandline
+~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-<use_case>.axf
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+
+ Ethos-U rev 0 --- Oct 13 2020 11:27:45
+ (C) COPYRIGHT 2019-2020 Arm Limited
+ ALL RIGHTS RESERVED
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started it outputs a menu and waits for the user input from telnet terminal.
+
+For example, the image classification use case can be started by:
+
+```commandline
+~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-img_class.axf
+```
+
+The FVP supports many command line parameters:
+
+- passed by using `-C <param>=<value>`. The most important ones are:
+ - `ethosu.num_macs`: Sets the Ethos-U55 configuration for the model. Valid parameters are `32`, `64`, `256`,
+ and the default one `128`. The number signifies the 8x8 MACs performed per cycle count available on the hardware.
+ - `cpu0.CFGITCMSZ`: ITCM size for the Cortex-M CPU. Size of ITCM is pow(2, CFGITCMSZ - 1) KB
+ - `cpu0.CFGDTCMSZ`: DTCM size for the Cortex-M CPU. Size of DTCM is pow(2, CFGDTCMSZ - 1) KB
+ - `mps3_board.telnetterminal0.start_telnet` : Starts the telnet session if nothing connected.
+ - `mps3_board.uart0.out_file`: Sets the output file to hold data written by the UART
+ (use '-' to send all output to stdout, empty by default).
+ - `mps3_board.uart0.shutdown_on_eot`: Sets to shutdown simulation when a EOT (ASCII 4) char is transmitted.
+ - `mps3_board.visualisation.disable-visualisation`: Enables or disables visualisation (disabled by default).
+
+ To start the model in `128` mode for Ethos-U55:
+
+ ```commandline
+ ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-img_class.axf -C ethosu.num_macs=128
+ ```
+
+- `-l`: shows the full list of supported parameters
+
+ ```commandline
+ ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -l
+ ```
+
+- `--stat`: prints some run statistics on simulation exit
+
+ ```commandline
+ ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 --stat
+ ```
+
+- `--timelimit`: sets the number of wall clock seconds for the simulator to run, excluding startup and shutdown.
+
+## MPS3 board
+
+> **Note:** Before proceeding, make sure you have the MPS3 board powered on,
+and USB A to B connected between your machine and the MPS3.
+The connector on the MPS3 is marked as "Debug USB".
+
+![MPS3](../media/mps3.png)
+
+1. MPS3 board top view.
+
+Once the board has booted, the micro SD card will enumerate as a mass
+storage device. On most systems this will be automatically mounted, but
+you might need to mount it manually.
+
+Also, there should be four serial-over-USB ports available for use via
+this connection. On Linux based machines, these would typically be
+*/dev/ttyUSB\<n\>* to */dev/ttyUSB\<n+3\>*.
+
+The default configuration for all of them is 115200, 8/N/1 (15200 bauds,
+8 bits, no parity and 1 stop bit) with no flow control.
+
+> **Note:** For Windows machines, additional FTDI drivers might need to be installed
+for these serial ports to be available.
+For more information on getting started with an MPS3 board, please refer to
+<https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/MPS3GettingStarted.pdf>
+
+### Deployment on MPS3 board
+
+> **NOTE**: These instructions are valid only if the evaluation is being
+ done using the MPS3 FPGA platform using either `SSE-200` or `SSE-300`.
+
+To run the application on MPS3 platform, firstly it's necessary to make sure
+that the platform has been set up using the correct configuration.
+For details, on platform set up, please see the relevant documentation. For `Arm Corstone-300`, this is available
+[here](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/DAI0547B_SSE300_PLUS_U55_FPGA_for_mps3.pdf?revision=d088d931-03c7-40e4-9045-31ed8c54a26f&la=en&hash=F0C7837C8ACEBC3A0CF02D871B3A6FF93E09C6B8).
+
+For MPS3 board, instead of loading the axf file directly, the executable blobs
+generated under the *sectors/<use_case>* subdirectory need to be
+copied over to the MP3 board's micro SD card. Also, every use case build
+generates a corresponding images.txt file which is used by the MPS3 to
+understand which memory regions the blobs are to be loaded into.
+
+Once the USB A <--> B cable between the MPS3 and the development machine
+is connected and the MPS3 board powered on, the board should enumerate
+as a mass storage device over this USB connection.
+There might be two devices also, depending on the version of the board
+you are using. The device named `V2M-MPS3` or `V2MMPS3` is the `SD card`.
+
+If the axf/elf file is within 1MiB, it can be flashed into the FPGA
+memory directly without having to break it down into separate load
+region specific blobs. However, with neural network models exceeding
+this size, it becomes necessary to follow this approach.
+
+1. For example, the image classification use case will produce:
+
+ ```tree
+ ./bin/sectors/
+ └── img_class
+ ├── dram.bin
+ └── itcm.bin
+ ```
+
+ For example, if the micro SD card is mounted at
+ /media/user/V2M-MPS3/:
+
+ ```commandline
+ cp -av ./bin/sectors/img_class/* /media/user/V2M-MPS3/SOFTWARE/
+ ```
+
+2. The generated `\<use-case\>_images.txt` file needs to be copied
+over to the MPS3. The exact location for the destination will depend
+on the MPS3 board's version and the application note for the bit
+file in use.
+For example, for MPS3 board hardware revision C, using an
+application note directory named "ETHOSU", to replace the images.txt
+file:
+
+ ```commandline
+ cp ./bin/images-img_class.txt /media/user/V2M-MPS3/MB/HBI0309C/ETHOSU/images.txt
+ ```
+
+3. Open the first serial port available from MPS3, for example,
+"/dev/ttyUSB0". This can be typically done using minicom, screen or
+Putty application. Make sure the flow control setting is switched
+off.
+
+ ```commandline
+ minicom --D /dev/ttyUSB0
+ ```
+
+ ```log
+ Welcome to minicom 2.7.1
+ OPTIONS: I18n
+ Compiled on Aug 13 2017, 15:25:34.
+ Port /dev/ttyUSB0, 16:05:34
+ Press CTRL-A Z for help on special keys
+ Cmd>
+ ```
+
+4. In another terminal, open the second serial port, for example,
+ "/dev/ttyUSB1":
+
+ ```commandline
+ minicom --D /dev/ttyUSB1
+ ```
+
+5. On the first serial port, issue a "reboot" command and press the
+ return key
+
+ ```commandline
+ $ Cmd> reboot
+ ```
+
+ ```log
+ Rebooting...Disabling debug USB..Board rebooting...
+
+ ARM V2M-MPS3 Firmware v1.3.2
+ Build Date: Apr 20 2018
+
+ Powering up system...
+ Switching on main power...
+ Configuring motherboard (rev C, var A)...
+ ```
+
+ This will go on to reboot the board and prime the application to run by
+ flashing the binaries into their respective FPGA memory locations. For example:
+
+ ```log
+ Reading images file \MB\HBI0309C\ETHOSU\images.txt
+ Writing File \SOFTWARE\itcm.bin to Address 0x00000000
+
+ ............
+
+ File \SOFTWARE\itcm.bin written to memory address 0x00000000
+ Image loaded from \SOFTWARE\itcm.bin
+ Writing File \SOFTWARE\dram.bin to Address 0x08000000
+
+ ..........................................................................
+
+
+ File \SOFTWARE\dram.bin written to memory address 0x08000000
+ Image loaded from \SOFTWARE\dram.bin
+ ```
+
+6. When the reboot from previous step is completed, issue a reset
+ command on the command prompt.
+
+ ``` commandline
+ $ Cmd> reset
+ ```
+
+ This will trigger the application to start, and the output should be visible on the second serial connection.
+
+7. On the second serial port, output similar to section 2.2 should be visible:
+
+ ```log
+ [INFO] Setting up system tick IRQ (for NPU)
+ [INFO] V2M-MPS3 revision C
+ [INFO] Application Note AN540, Revision B
+ [INFO] FPGA build 1
+ [INFO] Core clock has been set to: 32000000 Hz
+ [INFO] CPU ID: 0x410fd220
+ [INFO] CPU: Cortex-M55 r0p0
+ ...
+ ```
+
+
+Next section of the main documentation, [Running code samples applications](../documentation.md#Running-code-samples-applications).
diff --git a/docs/sections/run.md b/docs/sections/run.md
new file mode 100644
index 0000000..90ee7c8
--- /dev/null
+++ b/docs/sections/run.md
@@ -0,0 +1,42 @@
+
+# Running Ethos-U55 Code Samples
+
+- [Starting Fast Model simulation](#starting-fast-model-simulation)
+
+This section covers the process for getting started with pre-built binaries for the Code Samples.
+
+## Starting Fast Model simulation
+
+Once built application binaries and assuming the install location of the FVP
+was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-<use_case>.axf
+```
+
+This will start the Fast Model simulation for the chosen use-case.
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's
+standard output and error log entries containing information about the
+pre-built application version, TensorFlow Lite Micro library version
+used, data type as well as the input and output tensor sizes of the
+model compiled into the executable binary.
+
+![FVP](../media/fvp.png)
+
+![FVP Terminal](../media/fvpterminal.png)
+
+> **Note:**
+For details on the specific use-case follow the instructions in the corresponding documentation.
+
+Next section of the documentation: [Implementing custom ML application](../documentation.md#Implementing-custom-ML-application).
diff --git a/docs/sections/testing_benchmarking.md b/docs/sections/testing_benchmarking.md
new file mode 100644
index 0000000..43bb7f4
--- /dev/null
+++ b/docs/sections/testing_benchmarking.md
@@ -0,0 +1,87 @@
+# Testing and benchmarking
+
+- [Testing](#testing)
+- [Benchmarking](#benchmarking)
+
+## Testing
+
+The `tests` folder has the following structure:
+
+```tree
+.
+├── common
+│ └── ...
+├── use_case
+│ ├── <usecase1>
+│ │ └── ...
+│ ├── <usecase2>
+│ │ └── ...
+└── utils
+ └── ...
+```
+
+Where:
+
+- `common`: contains tests for generic and common appplication functions.
+- `use_case`: contains all the use case specific tests in the respective folders.
+- `utils`: contains utilities sources used only within the tests.
+
+When [configuring](./building.md#configuring-the-build-native-unit-test) and
+[building](./building.md#Building-the-configured-project) for `native` target platform results of the build will
+be placed under `build/bin/` folder, for example:
+
+```tree
+.
+├── dev_ethosu_eval-<usecase1>-tests
+├── dev_ethosu_eval-<usecase2>-tests
+├── ethos-u-<usecase1>
+└── ethos-u-<usecase1>
+```
+
+To execute unit-tests for a specific use-case in addition to the common tests:
+
+```commandline
+dev_ethosu_eval-<use_case>-tests
+```
+
+```log
+[INFO] native platform initialised
+[INFO] ARM Ethos-U55 Evaluation application for MPS3 FPGA Prototyping Board and FastModel
+
+...
+===============================================================================
+ All tests passed (37 assertions in 7 test cases)
+```
+
+Tests output could have `[ERROR]` messages, that's alright - they are coming from negative scenarios tests.
+
+## Benchmarking
+
+Profiling is enabled by default when configuring the project. This will enable displaying:
+
+- the active and idle NPU cycle counts when Arm® Ethos™-U55 is enabled (see `-DETHOS_U55_ENABLED` in
+ [Build options](./building.md#build-options).
+- CPU cycle counts and/or in milliseconds elapsed for inferences performed if CPU profiling is enabled
+ (see `-DCPU_PROFILE_ENABLED` in [Build options](./building.md#build-options). This should be done only
+ when running on a physical FPGA board as the FVP does not contain a cycle-approximate or cycle-accurate Cortex-M model.
+
+For example:
+
+- On the FVP:
+
+```log
+ Active NPU cycles: 5475412
+ Idle NPU cycles: 702
+```
+
+- For MPS3 platform, the time duration in milliseconds is also reported when `-DCPU_PROFILE_ENABLED=1` is added to
+ CMake configuration command:
+
+```log
+ Active NPU cycles: 5629033
+ Idle NPU cycles: 1005276
+ Active CPU cycles: 993553 (approx)
+ Time in ms: 210
+```
+
+Next section of the main documentation: [Troubleshooting](../documentation.md#Troubleshooting).
diff --git a/docs/sections/troubleshooting.md b/docs/sections/troubleshooting.md
new file mode 100644
index 0000000..40b975a
--- /dev/null
+++ b/docs/sections/troubleshooting.md
@@ -0,0 +1,27 @@
+# Troubleshooting
+
+- [Inference results are incorrect for my custom files](#inference-results-are-incorrect-for-my-custom-files)
+- [The application does not work with my custom model](#the-application-does-not-work-with-my-custom-model)
+
+## Inference results are incorrect for my custom files
+
+Ensure that the files you are using match the requirements of the model
+you are using and that cmake parameters are set accordingly. More
+information on these cmake parameters is detailed in their separate
+sections. Note that preprocessing of the files could also affect the
+inference result, such as the rescaling and padding operations done for
+image classification.
+
+## The application does not work with my custom model
+
+Ensure that your model is in a fully quantized `.tflite` file format,
+either uint8 or int8, and has successfully been run through the Vela
+compiler.
+
+Check that cmake parameters match your new models input requirements.
+
+> **Note:** Vela tool is not available within this software project.
+It is a python tool available from <https://pypi.org/project/ethos-u-vela/>.
+The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>.
+
+Next section of the documentation: [Contribution guidelines](../documentation.md#Contribution-guidelines).
diff --git a/docs/use_cases/ad.md b/docs/use_cases/ad.md
new file mode 100644
index 0000000..ca95af8
--- /dev/null
+++ b/docs/use_cases/ad.md
@@ -0,0 +1,523 @@
+# Anomaly Detection Code Sample
+
+ - [Introduction](#introduction)
+ - [Prerequisites](#prerequisites)
+ - [Building the code sample application from sources](#building-the-code-sample-application-from-sources)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Add custom input](#add-custom-input)
+ - [Add custom model](#add-custom-model)
+ - [Setting-up and running Ethos-U55 Code Sample](#setting-up-and-running-ethos-u55-code-sample)
+ - [Setting up the Ethos-U55 Fast Model](#setting-up-the-ethos-u55-fast-model)
+ - [Starting Fast Model simulation](#starting-fast-model-simulation)
+ - [Running Anomaly Detection](#running-anomaly-detection)
+ - [Anomaly Detection processing information](#anomaly-detection-processing-information)
+ - [Preprocessing and feature extraction](#preprocessing-and-feature-extraction)
+ - [Postprocessing](#postprocessing)
+
+## Introduction
+
+This document describes the process of setting up and running the Arm® Ethos™-U55 Anomaly Detection example.
+
+Use case code could be found in [source/use_case/ad](../../source/use_case/ad]) directory.
+
+### Preprocessing and feature extraction
+
+The Anomaly Detection model that is used with the Code Samples expects audio data to be preprocessed
+in a specific way before performing an inference. This section aims to provide an overview of the feature extraction
+process used.
+
+First the audio data is normalized to the range (-1, 1).
+
+Next, a window of 1024 audio samples are taken from the start of the audio clip. From these 1024 samples we calculate 64
+Log Mel Energies that form part of a Log Mel Spectrogram.
+
+The window is shifted by 512 audio samples and another 64 Log Mel Energies are calculated. This is repeated until we
+have 64 sets of Log Mel Energies.
+
+This 64x64 matrix of values is resized by a factor of 2 resulting in a 32x32 matrix of values.
+
+The average of the training dataset is subtracted from this 32x32 matrix and an inference can then be performed.
+
+We start this process again but shifting the start by 20\*512=10240 audio samples. This keeps repeating until enough
+inferences have been performed to cover the whole audio clip.
+
+### Postprocessing
+
+Softmax is applied to the result of each inference. Based on the machine ID of the wav clip being processed we look at a
+specific index in each output vector. An average of the negative value at this index across all the inferences performed
+for the audio clip is taken. If this average value is greater than a chosen threshold score, then the machine in the
+clip is not behaving anomalously. If the score is lower than the threshold then the machine in the clip is behaving
+anomalously.
+
+### Prerequisites
+
+See [Prerequisites](../documentation.md#prerequisites)
+
+## Building the code sample application from sources
+
+### Build options
+
+In addition to the already specified build option in the main documentation, Anomaly Detection use case adds:
+
+- `ad_MODEL_TFLITE_PATH` - Path to the NN model file in TFLite format. Model will be processed and included into
+the application axf
+ file. The default value points to one of the delivered set of models. Note that the parameters `ad_LABELS_TXT_FILE`,
+ `TARGET_PLATFORM` and `ETHOS_U55_ENABLED` should be aligned with the chosen model, i.e.:
+ - if `ETHOS_U55_ENABLED` is set to `On` or `1`, the NN model is assumed to be optimized. The model will naturally fall
+back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
+ - if `ETHOS_U55_ENABLED` is set to `Off` or `0`, the NN model is assumed to be unoptimized. Supplying an optimized
+model in this case will result in a runtime error.
+
+- `ad_FILE_PATH`: Path to the directory containing audio files, or a path to single WAV file, to be used in the
+ application. The default value points to the resources/ad/samples folder containing the delivered set of audio clips.
+
+- `ad_AUDIO_RATE`: Input data sampling rate. Each audio file from ad_FILE_PATH is preprocessed during the build to match
+NN model input requirements.
+ Default value is 16000.
+
+- `ad_AUDIO_MONO`: If set to ON the audio data will be converted to mono. Default is ON.
+
+- `ad_AUDIO_OFFSET`: Start loading audio data starting from this offset (in seconds). Default value is 0.
+
+- `ad_AUDIO_DURATION`: Length of the audio data to be used in the application in seconds. Default is 0 meaning the
+ whole audio file will be taken.
+
+- `ad_AUDIO_MIN_SAMPLES`: Minimum number of samples required by the network model. If the audio clip is shorter than
+ this number, it is padded with zeros. Default value is 16000.
+
+- `ad_MODEL_SCORE_THRESHOLD`: Threshold value to be applied to average softmax score over the clip, if larger than this
+score we have an anomaly.
+
+- `ad_ACTIVATION_BUF_SZ`: The intermediate/activation buffer size reserved for the NN model. By default, it is set to
+ 2MiB and should be enough for most models.
+
+In order to build **ONLY** Anomaly Detection example application add to the `cmake` command line specified in [Building](../documentation.md#Building) `-DUSE_CASE_BUILD=ad`.
+
+### Build process
+
+> **Note:** This section describes the process for configuring the build for `MPS3: SSE-300` for different target
+>platform see [Building](../documentation.md#Building).
+
+Create a build directory folder and navigate inside:
+
+```commandline
+mkdir build_ad && cd build_ad
+```
+
+On Linux, execute the following command to build **only** Anomaly Detection application to run on the Ethos-U55 Fast Model when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific file to set the compiler and platform specific
+parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+> **Note:** If building for different Ethos-U55 configurations, see
+[Configuring build for different Arm Ethos-U55 configurations](../sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations):
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver is not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` can be used to configure the location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+Also, `CMSIS_SRC_PATH` parameter can be used to override the CMSIS sources used for compilation used by TensorFlow by
+default. For example, to use the CMSIS sources fetched by the ethos-u helper script, we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=../ethos-u/core_software/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=../ethos-u/core_software/core_driver \
+ -DCMSIS_SRC_PATH=../ethos-u/core_software/cmsis \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+> **Note:** If re-building with changed parameters values, it is highly advised to clean the build directory and re-run the CMake command.
+
+If the CMake command succeeded, build the application as follows:
+
+```commandline
+make -j4
+```
+
+For Windows, use `mingw32-make`.
+
+Add VERBOSE=1 to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder:
+
+```tree
+bin
+ ├── ethos-u-.axf
+ ├── ethos-u-ad.htm
+ ├── ethos-u-.map
+ ├── images-ad.txt
+ └── sectors
+ └── ad
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where:
+
+- `ethos-u-ad.axf`: The built application binary for the Anomaly Detection use case.
+
+- `ethos-u-ad.map`: Information from building the application (e.g. libraries used, what was optimized, location of
+ objects)
+
+- `ethos-u-ad.htm`: Human readable file containing the call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files for loading into different FPGA memory regions.
+
+- `Images-ad.txt`: Tells the FPGA which memory regions to use for loading the binaries in sectors/\*\* folder.
+
+### Add custom input
+
+The application anomaly detection on audio data found in the folder, or an individual file, set by the CMake parameter
+``ad_FILE_PATH``.
+
+To run the application with your own audio clips first create a folder to hold them and then copy the custom clips into
+this folder:
+
+```commandline
+mkdir /tmp/custom_files
+
+cp custom_id_00.wav /tmp/custom_files/
+```
+
+> **Note:** The data used for this example comes from
+[https://zenodo.org/record/3384388\#.X6GILFNKiqA](https://zenodo.org/record/3384388\#.X6GILFNKiqA)
+and the model included in this example is trained on the ‘Slider’ part of the dataset.
+The machine ID (00, 02, 04, 06) the clip comes from must be in the file name for the application to work.
+The file name should have a pattern that matches
+e.g. `<any>_<text>_00_<here>.wav` if the audio was from machine ID 00
+or `<any>_<text>_02_<here>.wav` if it was from machine ID 02 etc.
+>
+> **Note:** Clean the build directory before re-running the CMake command.
+
+Next set ad_FILE_PATH to the location of this folder when building:
+
+```commandline
+cmake \
+ -Dad_FILE_PATH=/tmp/custom_files/ \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+The images found in the _DIR folder will be picked up and automatically converted to C++ files during the CMake
+configuration stage and then compiled into the application during the build phase for performing inference with.
+
+The log from the configuration stage should tell you what image directory path has been used:
+
+```log
+-- User option ad_FILE_PATH is set to /tmp/custom_files
+```
+
+After compiling, your custom inputs will have now replaced the default ones in the application.
+
+### Add custom model
+
+The application performs inference using the model pointed to by the CMake parameter ``ad_MODEL_TFLITE_PATH``.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom model has been run through the Vela compiler
+>successfully before continuing. See [Optimize model with Vela compiler](../sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+An example:
+
+```commandline
+cmake \
+ -Dad_MODEL_TFLITE_PATH=<path/to/custom_ad_model_after_vela.tflite> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=ad ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The `.tflite` model file pointed to by `ad_MODEL_TFLITE_PATH` will be converted
+to C++ files during the CMake configuration
+stage and then compiled into the application for performing inference with.
+
+The log from the configuration stage should tell you what model path has been used:
+
+```log
+-- User option TARGET_PLATFORM is set to fastmodel
+-- User option ad_MODEL_TFLITE_PATH is set to <path/to/custom_ad_model_after_vela.tflite>
+...
+-- Using <path/to/custom_ad_model_after_vela.tflite>
+++ Converting custom_ad_model_after_vela.tflite to custom_ad_model_after_vela.tflite.cc
+...
+```
+
+After compiling, your custom model will have now replaced the default one in the application.
+
+ >**Note:** In order to successfully run the model, the NPU needs to be enabled and
+ the platform `TARGET_PLATFORM` is set to mps3 and TARGET_SUBSYSTEM is SSE-200 or SSE-300.
+
+## Setting-up and running Ethos-U55 Code Sample
+
+### Setting up the Ethos-U55 Fast Model
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads
+](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+
+For Ethos-U55 evaluation, please download the MPS3 version of the Arm® Corstone™-300 model that contains Ethos-U55 and
+Cortex-M55. The model is currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+```commandline
+.FVP_Corstone_SSE-300_Ethos-U55.sh
+```
+
+- Follow the instructions to install the FVP to your desired location
+
+### Starting Fast Model simulation
+
+> **Note:** The anomaly detection example does not come pre-built. You will first need to follow the instructions in
+>section 3 for building the application from source.
+
+After building, and assuming the install location of the FVP was set to ~/FVP_install_location, the simulation can be
+started by:
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 ./bin/ethos-u-ad.axf
+```
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started if `ad_FILE_PATH` pointed to a single file (or a folder containing a single input file)
+the inference starts immediately. In case of multiple inputs choice, it outputs a menu and waits for the user input from
+telnet terminal:
+
+```log
+User input required
+Enter option number from:
+
+1. Classify next audio clip
+2. Classify audio clip at chosen index
+3. Run classification on all audio clips
+4. Show NN model info
+5. List audio clips
+
+Choice:
+
+```
+
+1. “Classify next audio clip” menu option will run single inference on the next in line.
+
+2. “Classify audio clip at chosen index” menu option will run inference on the chosen audio clip.
+
+ > **Note:** Please make sure to select audio clip index in the range of supplied audio clips during application build.
+ By default, pre-built application has 4 files, indexes from 0 to 3.
+
+3. “Run ... on all” menu option triggers sequential inference executions on all built-in .
+
+4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes:
+
+ ```log
+ [INFO] uTFL version: 2.5.0
+ [INFO] Model info:
+ [INFO] Model INPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 1024 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 32
+ [INFO] 2: 32
+ [INFO] 3: 1
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.192437
+ [INFO] ZeroPoint[0] = 11
+ [INFO] Model OUTPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 8 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 8
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.048891
+ [INFO] ZeroPoint[0] = -30
+ [INFO] Activation buffer (a.k.a tensor arena) size used: 198016
+ [INFO] Number of operators: 1
+ [INFO] Operator 0: ethos-u
+ [INFO] Use of Arm uNPU is enabled
+
+ ```
+
+5. “List” menu option prints a list of pair ... indexes - the original filenames embedded in the application:
+
+ ```log
+ [INFO] List of Files:
+ [INFO] 0 =>; anomaly_id_00_00000000.wav
+ [INFO] 1 =>; anomaly_id_02_00000076.wav
+ [INFO] 2 =>; normal_id_00_00000004.wav
+ [INFO] 3 =>; normal_id_02_00000001.wav
+ ```
+
+### Running Anomaly Detection
+
+Please select the first menu option to execute Anomaly Detection.
+
+The following example illustrates application output:
+
+```log
+[INFO] Running inference on audio clip 0 => anomaly_id_00_00000000.wav
+[INFO] Inference 1/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1081154
+ Idle NPU cycles: 1012
+
+[INFO] Inference 2/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080934
+ Idle NPU cycles: 232
+
+[INFO] Inference 3/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1081332
+ Idle NPU cycles: 834
+
+[INFO] Inference 4/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080748
+ Idle NPU cycles: 418
+
+[INFO] Inference 5/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080728
+ Idle NPU cycles: 438
+
+[INFO] Inference 6/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1081144
+ Idle NPU cycles: 1022
+
+[INFO] Inference 7/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080924
+ Idle NPU cycles: 242
+
+[INFO] Inference 8/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1081322
+ Idle NPU cycles: 844
+
+[INFO] Inference 9/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080738
+ Idle NPU cycles: 428
+
+[INFO] Inference 10/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080718
+ Idle NPU cycles: 448
+
+[INFO] Inference 11/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1081134
+ Idle NPU cycles: 1032
+
+[INFO] Inference 12/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1080914
+ Idle NPU cycles: 252
+
+[INFO] Inference 13/13
+[INFO] Profile for Inference:
+ Active NPU cycles: 1081312
+ Idle NPU cycles: 854
+
+[INFO] Average anomaly score is: -0.024493
+Anomaly threshold is: -0.800000
+Anomaly detected!
+
+```
+
+As multiple inferences have to be run for one clip it will take around a minute or so for all inferences to complete.
+
+For the anomaly_id_00_00000000.wav clip, after averaging results across all inferences the score is greater than the
+chosen anomaly threshold so an anomaly was detected with the machine in this clip.
+
+The profiling section of the log shows that for each inference. For the last inference the profiling reports:
+
+- Ethos-U55's PMU report:
+
+ - 1,081,312 active cycles: number of cycles that were used for computation
+
+ - 854 idle cycles: number of cycles for which the NPU was idle
+
+- For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as
+ the CPU model is not cycle-approximate or cycle-accurate.
diff --git a/docs/use_cases/asr.md b/docs/use_cases/asr.md
new file mode 100644
index 0000000..d224aca
--- /dev/null
+++ b/docs/use_cases/asr.md
@@ -0,0 +1,529 @@
+# Automatic Speech Recognition Code Sample
+
+- [Introduction](#introduction)
+ - [Prerequisites](#prerequisites)
+- [Building the code sample application from sources](#building-the-code-sample-application-from-sources)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Add custom input](#add-custom-input)
+ - [Add custom model](#add-custom-model)
+- [Setting-up and running Ethos-U55 Code Sample](#setting-up-and-running-ethos-u55-code-sample)
+ - [Setting up the Ethos-U55 Fast Model](#setting-up-the-ethos-u55-fast-model)
+ - [Starting Fast Model simulation](#starting-fast-model-simulation)
+ - [Running Automatic Speech Recognition](#running-automatic-speech-recognition)
+- [Automatic Speech Recognition processing information](#automatic-speech-recognition-processing-information)
+ - [Preprocessing and feature extraction](#preprocessing-and-feature-extraction)
+ - [Postprocessing](#postprocessing)
+
+## Introduction
+
+This document describes the process of setting up and running the Arm® Ethos™-U55 Automatic Speech Recognition example.
+
+Use case code could be found in [source/use_case/asr](../../source/use_case/asr]) directory.
+
+### Preprocessing and feature extraction
+
+The wav2letter automatic speech recognition model that is used with the Code Samples expects audio data to be
+preprocessed in a specific way before performing an inference. This section aims to provide an overview of the feature
+extraction process used.
+
+First the audio data is normalized to the range (-1, 1).
+
+> **Note:** Mel-frequency cepstral coefficients (MFCCs) are a common feature extracted from audio data and can be used as
+>input for machine learning tasks like keyword spotting and speech recognition. See source/application/main/include/Mfcc.hpp
+>for implementation details.
+
+Next, a window of 512 audio samples is taken from the start of the audio clip. From these 512 samples we calculate 13
+MFCC features.
+
+The whole window is shifted to the right by 160 audio samples and 13 new MFCC features are calculated. This process of
+shifting and calculating is repeated until enough audio samples to perform an inference have been processed. In total
+this will be 296 windows that each have 13 MFCC features calculated for them.
+
+After extracting MFCC features the first and second order derivatives of these features with respect to time are
+calculated. These derivative features are then standardized and concatenated with the MFCC features (which also get
+standardized). At this point the input tensor will have a shape of 296x39.
+
+These extracted features are quantized, and an inference is performed.
+
+![ASR preprocessing](../media/ASR_preprocessing.png)
+
+For longer audio clips where multiple inferences need to be performed, then the initial starting position is offset by
+(100*160) = 16000 audio samples. From this new starting point, MFCC and derivative features are calculated as before
+until there is enough to perform another inference. Padding can be used if there are not enough audio samples for at
+least 1 inference. This step is repeated until the whole audio clip has been processed. If there are not enough audio
+samples for a final complete inference the MFCC features will be padded by repeating the last calculated feature until
+an inference can be performed.
+
+> **Note:** Parameters of the MFCC feature extraction such as window size, stride, number of features etc. all depend on
+>what was used during model training. These values are specific to each model. If you switch to a different ASR model
+>than the one supplied, then the feature extraction process could be completely different to the one currently implemented.
+
+The amount of audio samples we offset by for long audio clips is specific to the included wav2letter model.
+
+### Postprocessing
+
+After performing an inference, the raw output need to be postprocessed to get a usable result.
+
+The raw output from the model is a tensor of shape 148x29 where each row is a probability distribution over the possible
+29 characters that can appear at each of the 148 time steps.
+
+This wav2letter model is trained using context windows, this means that only certain parts of the output are usable
+depending on the bit of the audio clip that is currently being processed.
+
+If this is the first inference and multiple inferences are required, then ignore the final 49 rows of the output.
+Similarly, if this is the final inference from multiple inferences then ignore the first 49 rows of the output. Finally,
+if this inference is not the last or first inference then ignore the first and last 49 rows of the model output.
+
+> **Note:** If the audio clip is small enough then the whole of the model output is usable and there is no need to throw
+>away any of the output before continuing.
+
+Once any rows have been removed the final processing can be done. To process the output, first the letter with the
+highest probability at each time step is found. Next, any letters that are repeated multiple times in a row are removed
+(e.g. [t, t, t, o, p, p] becomes [t, o, p]). Finally, the 29th blank token letter is removed from the output.
+
+For the final output, the result from all inferences are combined before decoding. What you are left with is then
+displayed to the console.
+
+### Prerequisites
+
+See [Prerequisites](../documentation.md#prerequisites)
+
+## Building the code sample application from sources
+
+### Build options
+
+In addition to the already specified build option in the main documentation, Automatic Speech Recognition use case
+adds:
+
+- `asr_MODEL_TFLITE_PATH` - Path to the NN model file in TFLite format. Model will be processed and included into the
+application axf file. The default value points to one of the delivered set of models. Note that the parameters
+`asr_LABELS_TXT_FILE`,`TARGET_PLATFORM` and `ETHOS_U55_ENABLED` should be aligned with the chosen model, i.e.:
+ - if `ETHOS_U55_ENABLED` is set to `On` or `1`, the NN model is assumed to be optimized. The model will naturally
+fall back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
+ - if `ETHOS_U55_ENABLED` is set to `Off` or `0`, the NN model is assumed to be unoptimized. Supplying an optimized
+model in this case will result in a runtime error.
+
+- `asr_FILE_PATH`: Path to the directory containing audio files, or a path to single WAV file, to be used in the
+ application. The default value points
+ to the resources/asr/samples folder containing the delivered set of audio clips.
+
+- `asr_LABELS_TXT_FILE`: Path to the labels' text file. The file is used to map letter class index to the text label.
+ The default value points to the delivered labels.txt file inside the delivery package.
+
+- `asr_AUDIO_RATE`: Input data sampling rate. Each audio file from asr_FILE_PATH is preprocessed during the build to
+ match NN model input requirements. Default value is 16000.
+
+- `asr_AUDIO_MONO`: If set to ON the audio data will be converted to mono. Default is ON.
+
+- `asr_AUDIO_OFFSET`: Start loading audio data starting from this offset (in seconds). Default value is 0.
+
+- `asr_AUDIO_DURATION`: Length of the audio data to be used in the application in seconds. Default is 0 meaning the
+ whole audio file will be taken.
+
+- `asr_AUDIO_MIN_SAMPLES`: Minimum number of samples required by the network model. If the audio clip is shorter than
+ this number, it is padded with zeros. Default value is 16000.
+
+- `asr_MODEL_SCORE_THRESHOLD`: Threshold value that must be applied to the inference results for a label to be
+ deemed valid. Default is 0.5.
+
+- `asr_ACTIVATION_BUF_SZ`: The intermediate/activation buffer size reserved for the NN model. By default, it is set
+ to 2MiB and should be enough for most models.
+
+In order to build **ONLY** automatic speech recognition example application add to the `cmake` command line specified in
+[Building](../documentation.md#Building) `-DUSE_CASE_BUILD=asr`.
+
+### Build process
+
+> **Note:** This section describes the process for configuring the build for `MPS3: SSE-300` for different target
+>platform see [Building](../documentation.md#Building) section.
+
+In order to build **only** the automatic speech recognition example, create a build directory and navigate inside:
+
+```commandline
+mkdir build_asr && cd build_asr
+```
+
+On Linux, execute the following command to build **only** Automatic Speech Recognition application to run on the
+Ethos-U55 Fast Model when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=asr ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=asr ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific file to set the compiler and platform specific
+parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DUSE_CASE_BUILD=asr ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 \
+ -DUSE_CASE_BUILD=asr ..
+```
+
+> **Note:** If building for different Ethos-U55 configurations, see
+>[Configuring build for different Arm Ethos-U55 configurations](../sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations):
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver is not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` can be used to configure the location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DUSE_CASE_BUILD=asr ..
+```
+
+Also, `CMSIS_SRC_PATH` parameter can be used to override the CMSIS sources used for compilation used by TensorFlow by
+default. For example, to use the CMSIS sources fetched by the ethos-u helper script, we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=../ethos-u/core_software/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=../ethos-u/core_software/core_driver \
+ -DCMSIS_SRC_PATH=../ethos-u/core_software/cmsis \
+ -DUSE_CASE_BUILD=asr ..
+```
+
+> **Note:** If re-building with changed parameters values, it is highly advised to clean the build directory and re-run
+>the CMake command.
+
+If the CMake command succeeded, build the application as follows:
+
+```commandline
+make -j4
+```
+
+For Windows, use `mingw32-make`.
+
+Add `VERBOSE=1` to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder:
+
+```tree
+bin
+ ├── ethos-u-asr.axf
+ ├── ethos-u-asr.htm
+ ├── ethos-u-asr.map
+ ├── images-asr.txt
+ └── sectors
+ └── asr
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where:
+
+- `ethos-u-asr.axf`: The built application binary for the Automatic Speech Recognition use case.
+
+- `ethos-u-asr.map`: Information from building the application (e.g. libraries used, what was optimized, location of
+ objects)
+
+- `ethos-u-asr.htm`: Human readable file containing the call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files for loading into different FPGA memory regions.
+
+- `Images-asr.txt`: Tells the FPGA which memory regions to use for loading the binaries in sectors/** folder.
+
+### Add custom input
+
+The application performs inference on audio data found in the folder, or an individual file, set by the CMake parameter
+`asr_FILE_PATH`.
+
+To run the application with your own audio clips first create a folder to hold them and then copy the custom audio clips
+into this folder:
+
+```commandline
+mkdir /tmp/custom_wavs
+
+cp my_clip.wav /tmp/custom_wavs/
+```
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+Next set `asr_FILE_PATH` to the location of this folder when building:
+
+```commandline
+cmake \
+ -Dasr_FILE_PATH=/tmp/custom_wavs/ \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DUSE_CASE_BUILD=asr \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+The audio clips found in the `asr_FILE_PATH` folder will be picked up and automatically converted to C++ files during the
+CMake configuration stage and then compiled into the application during the build phase for performing inference with.
+
+The log from the configuration stage should tell you what audio clip directory path has been used:
+
+```log
+-- User option asr_FILE_PATH is set to /tmp/custom_wavs
+-- Generating audio files from /tmp/custom_wavs
+++ Converting my_clip.wav to my_clip.cc
+++ Generating build/generated/asr/include/InputFiles.hpp
+++ Generating build/generated/asr/src/InputFiles.cc
+-- Defined build user options:
+-- asr_FILE_PATH=/tmp/custom_wavs
+```
+
+After compiling, your custom inputs will have now replaced the default ones in the application.
+
+> **Note:** The CMake parameter asr_AUDIO_MIN_SAMPLES determine the minimum number of input sample. When building the
+>application, if the size of the audio clips is less then asr_AUDIO_MIN_SAMPLES then it will be padded so that it does.
+
+### Add custom model
+
+The application performs inference using the model pointed to by the CMake parameter MODEL_TFLITE_PATH.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom model has been run through the Vela
+>compiler successfully before continuing. See [Optimize model with Vela compiler](../sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+To run the application with a custom model you will need to provide a labels_<model_name>.txt file of labels
+associated with the model. Each line of the file should correspond to one of the outputs in your model. See the provided
+labels_wav2letter.txt file for an example.
+
+Then, you must set `asr_MODEL_TFLITE_PATH` to the location of the Vela processed model file and `asr_LABELS_TXT_FILE`to
+the location of the associated labels file.
+
+An example:
+
+```commandline
+cmake \
+ -Dasr_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
+ -Dasr_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The `.tflite` model file pointed to by `asr_MODEL_TFLITE_PATH` and labels text file pointed to by `asr_LABELS_TXT_FILE`
+will be converted to C++ files during the CMake configuration stage and then compiled into the application for performing
+inference with.
+
+The log from the configuration stage should tell you what model path and labels file have been used:
+
+```log
+-- User option TARGET_PLATFORM is set to mps3
+-- User option asr_MODEL_TFLITE_PATH is set to <path/to/custom_model_after_vela.tflite>
+...
+-- User option asr_LABELS_TXT_FILE is set to <path/to/labels_custom_model.txt>
+...
+-- Using <path/to/custom_model_after_vela.tflite>
+++ Converting custom_model_after_vela.tflite to\
+custom_model_after_vela.tflite.cc
+-- Generating labels file from <path/to/labels_custom_model.txt>
+-- writing to <path/to/build/generated/src/Labels.cc>
+...
+```
+
+After compiling, your custom model will have now replaced the default one in the application.
+
+## Setting-up and running Ethos-U55 Code Sample
+
+### Setting up the Ethos-U55 Fast Model
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads
+](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+
+For Ethos-U55 evaluation, please download the MPS3 version of the Arm® Corstone™-300 model that contains Ethos-U55 and
+Cortex-M55. The model is currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+```commandline
+./FVP_Corstone_SSE-300_Ethos-U55.sh
+```
+
+- Follow the instructions to install the FVP to your desired location
+
+### Starting Fast Model simulation
+
+Once completed the building step, application binary ethos-u-asr.axf can be found in the `build/bin` folder.
+Assuming the install location of the FVP was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-asr.axf
+```
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started if `asr_FILE_PATH` pointed to a single file (or a folder containing a single input file)
+the inference starts immediately. In case of multiple inputs choice, it outputs a menu and waits for the user input from
+telnet terminal:
+
+```log
+User input required
+Enter option number from:
+
+1. Classify next audio clip
+2. Classify audio clip at chosen index
+3. Run classification on all audio clips
+4. Show NN model info
+5. List audio clips
+
+Choice:
+
+```
+
+1. “Classify next audio clip” menu option will run inference on the next in line voice clip from the collection of the
+ compiled audio.
+
+ > **Note:** Note that if the clip is over a certain length, the application will invoke multiple inference runs to
+>cover the entire file.
+
+2. “Classify audio clip at chosen index” menu option will run inference on the chosen audio clip.
+
+ > **Note:** Please make sure to select audio clip index in the range of supplied audio clips during application build.
+ By default, pre-built application has 4 files, indexes from 0 to 3.
+
+3. “Run classification on all audio clips” menu option triggers sequential inference executions on all built-in voice
+ samples.
+
+4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes:
+
+ ```log
+ [INFO] uTFL version: 2.5.0
+ [INFO] Model info:
+ [INFO] Model INPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 11544 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 296
+ [INFO] 2: 39
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.110316
+ [INFO] ZeroPoint[0] = -11
+ [INFO] Model OUTPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 4292 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 1
+ [INFO] 2: 148
+ [INFO] 3: 29
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.003906
+ [INFO] ZeroPoint[0] = -128
+ [INFO] Activation buffer (a.k.a tensor arena) size used: 783168
+ [INFO] Number of operators: 1
+ [INFO] Operator 0: ethos-u
+ [INFO] Use of Arm uNPU is enabled
+ ```
+
+5. “List” menu option prints a list of pair audio clip indexes - the original filenames embedded in the application:
+
+ ```log
+ [INFO] List of Files:
+ [INFO] 0 => anotherdoor.wav
+ [INFO] 1 => anotherengineer.wav
+ [INFO] 2 => itellyou.wav
+ [INFO] 3 => testingroutine.wav
+ ```
+
+### Running Automatic Speech Recognition
+
+Please select the first menu option to execute Automatic Speech Recognition.
+
+The following example illustrates application output:
+
+```log
+[INFO] Running inference on audio clip 0 => anotherdoor.wav
+[INFO] Inference 1/2
+[INFO] Profile for pre-processing:
+ Active NPU cycles: 0
+ Idle NPU cycles: 6
+
+[INFO] Profile for Inference:
+ Active NPU cycles: 28924342
+ Idle NPU cycles: 824
+
+[INFO] Inference 2/2
+[INFO] Profile for pre-processing:
+ Active NPU cycles: 0
+ Idle NPU cycles: 6
+
+[INFO] Profile for Inference:
+ Active NPU cycles: 28924298
+ Idle NPU cycles: 868
+
+[INFO] Result for inf 0: and he walked immediately out o t
+[INFO] Result for inf 1: he aparctment by anoer dor
+[INFO] Final result: and he walked immediately out o the aparctment by anoer dor
+```
+
+It could take several minutes to complete each inference (average time is 5-7 minutes), and on this audio clip multiple
+inferences were required to cover the whole clip.
+
+The profiling section of the log shows that for the first inference:
+
+- Ethos-U55's PMU report:
+
+ - 28,924,298 active cycles: number of NPU cycles that were used for computation
+
+ - 868 idle cycles: number of cycles for which the NPU was idle
+
+- For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as
+ the CPU model is not cycle-approximate or cycle-accurate.
+
+The application prints the decoded output from each of the inference runs as well as the final combined result.
diff --git a/docs/use_cases/img_class.md b/docs/use_cases/img_class.md
new file mode 100644
index 0000000..7a409f2
--- /dev/null
+++ b/docs/use_cases/img_class.md
@@ -0,0 +1,446 @@
+# Image Classification Code Sample
+
+- [Introduction](#introduction)
+ - [Prerequisites](#prerequisites)
+- [Building the code sample application from sources](#building-the-code-sample-application-from-sources)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Add custom input](#add-custom-input)
+ - [Add custom model](#add-custom-model)
+- [Setting-up and running Ethos-U55 code sample](#setting-up-and-running-ethos-u55-code-sample)
+ - [Setting up the Ethos-U55 Fast Model](#setting-up-the-ethos-u55-fast-model)
+ - [Starting Fast Model simulation](#starting-fast-model-simulation)
+ - [Running Image Classification](#running-image-classification)
+
+## Introduction
+
+This document describes the process of setting up and running the Arm® Ethos™-U55 Image Classification
+example.
+
+Use case solves classical computer vision problem: image classification. The ML sample was developed using MobileNet v2
+model trained on ImageNet dataset.
+
+Use case code could be found in [source/use_case/img_class](../../source/use_case/img_class]) directory.
+
+### Prerequisites
+
+See [Prerequisites](../documentation.md#prerequisites)
+
+## Building the code sample application from sources
+
+### Build options
+
+In addition to the already specified build option in the main documentation, Image Classification use case specifies:
+
+- `img_class_MODEL_TFLITE_PATH` - Path to the NN model file in TFLite format. Model will be processed and included into
+ the application axf file. The default value points to one of the delivered set of models. Note that the parameters
+ `img_class_LABELS_TXT_FILE`,`TARGET_PLATFORM` and `ETHOS_U55_ENABLED` should be aligned with the chosen model, i.e.:
+ - if `ETHOS_U55_ENABLED` is set to `On` or `1`, the NN model is assumed to be optimized. The model will naturally
+ fall back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
+ - if `ETHOS_U55_ENABLED` is set to `Off` or `0`, the NN model is assumed to be unoptimized. Supplying an optimized
+ model in this case will result in a runtime error.
+
+- `img_class_FILE_PATH`: Path to the directory containing images, or path to a single image file, to be used file(s) in
+ the application. The default value points to the resources/img_class/samples folder containing the delivered
+ set of images. See more in the [Add custom input data section](#add-custom-input).
+
+- `img_class_IMAGE_SIZE`: The NN model requires input images to be of a specific size. This parameter defines the
+ size of the image side in pixels. Images are considered squared. Default value is 224, which is what the supplied
+ MobilenetV2-1.0 model expects.
+
+- `img_class_LABELS_TXT_FILE`: Path to the labels' text file to be baked into the application. The file is used to
+ map classified classes index to the text label. Change this parameter to point to the custom labels file to map
+ custom NN model output correctly.\
+ The default value points to the delivered labels.txt file inside the delivery package.
+
+- `img_class_ACTIVATION_BUF_SZ`: The intermediate/activation buffer size reserved for the NN model. By default, it
+ is set to 2MiB and should be enough for most models.
+
+- `USE_CASE_BUILD`: set to img_class to build only this example.
+
+In order to build **ONLY** Image Classification example application add to the `cmake` command line specified in
+[Building](../documentation.md#Building) `-DUSE_CASE_BUILD=img_class`.
+
+### Build process
+
+> **Note:** This section describes the process for configuring the build for `MPS3: SSE-300` for different target platform
+see [Building](../documentation.md#Building).
+
+Create a build directory folder and navigate inside:
+
+```commandline
+mkdir build_img_class && cd build_img_class
+```
+
+On Linux, execute the following command to build **only** Image Classification application to run on the Ethos-U55 Fast
+Model when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific file to set the compiler and platform specific
+parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+> **Note:** If building for different Ethos-U55 configurations, see
+>[Configuring build for different Arm Ethos-U55 configurations](../sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations):
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver is not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` can be used to configure the location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+Also, `CMSIS_SRC_PATH` parameter can be used to override the CMSIS sources used for compilation used by TensorFlow by
+default. For example, to use the CMSIS sources fetched by the ethos-u helper script, we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=../ethos-u/core_software/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=../ethos-u/core_software/core_driver \
+ -DCMSIS_SRC_PATH=../ethos-u/core_software/cmsis \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+> **Note:** If re-building with changed parameters values, it is highly advised to clean the build directory and re-run
+>the CMake command.
+
+If the CMake command succeeds, build the application as follows:
+
+```commandline
+make -j4
+```
+
+For Windows, use `mingw32-make`.
+
+Add VERBOSE=1 to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder:
+
+```tree
+bin
+ ├── ethos-u-img_class.axf
+ ├── ethos-u-img_class.htm
+ ├── ethos-u-img_class.map
+ ├── images-img_class.txt
+ └── sectors
+ └── img_class
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where:
+
+- `ethos-u-img_class.axf`: The built application binary for the Image Classification use case.
+
+- `ethos-u-img_class.map`: Information from building the application (e.g. libraries used, what was optimized, location
+ of objects)
+
+- `ethos-u-img_class.htm`: Human readable file containing the call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files for loading into different FPGA memory regions.
+
+- `Images-img_class.txt`: Tells the FPGA which memory regions to use for loading the binaries in sectors/** folder.
+
+### Add custom input
+
+The application performs inference on input data found in the folder, or an individual file set by the CMake parameter
+img_class_FILE_PATH.
+
+To run the application with your own images, first create a folder to hold them and then copy the custom images into
+this folder, for example:
+
+```commandline
+mkdir /tmp/custom_images
+
+cp custom_image1.bmp /tmp/custom_images/
+```
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+Next set `img_class_FILE_PATH` to the location of this folder when building:
+
+```commandline
+cmake \
+ -Dimg_class_FILE_PATH=/tmp/custom_images/ \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+The images found in the `img_class_FILE_PATH` folder will be picked up and automatically converted to C++ files during
+the CMake configuration stage and then compiled into the application during the build phase for performing inference
+with.
+
+The log from the configuration stage should tell you what image directory path has been used:
+
+```log
+-- User option img_class_FILE_PATH is set to /tmp/custom_images
+-- User option img_class_IMAGE_SIZE is set to 224
+...
+-- Generating image files from /tmp/custom_images
+++ Converting custom_image1.bmp to custom_image1.cc
+...
+-- Defined build user options:
+...
+-- img_class_FILE_PATH=/tmp/custom_images
+-- img_class_IMAGE_SIZE=224
+```
+
+After compiling, your custom images will have now replaced the default ones in the application.
+
+> **Note:** The CMake parameter IMAGE_SIZE should match the model input size. When building the application,
+if the size of any image does not match IMAGE_SIZE then it will be rescaled and padded so that it does.
+
+### Add custom model
+
+The application performs inference using the model pointed to by the CMake parameter MODEL_TFLITE_PATH.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom model has been run through the Vela compiler
+>successfully before continuing. See [Optimize model with Vela compiler](../sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+To run the application with a custom model you will need to provide a labels_<model_name>.txt file of labels
+associated with the model. Each line of the file should correspond to one of the outputs in your model. See the provided
+labels_mobilenet_v2_1.0_224.txt file for an example.
+
+Then, you must set `img_class_MODEL_TFLITE_PATH` to the location of the Vela processed model file and
+`img_class_LABELS_TXT_FILE` to the location of the associated labels file.
+
+An example:
+
+```commandline
+cmake \
+ -Dimg_class_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
+ -Dimg_class_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=img_class ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The `.tflite` model file pointed to by `img_class_MODEL_TFLITE_PATH` and labels text file pointed to by
+`img_class_LABELS_TXT_FILE` will be converted to C++ files during the CMake configuration stage and then compiled into
+the application for performing inference with.
+
+The log from the configuration stage should tell you what model path and labels file have been used:
+
+```log
+-- User option img_class_MODEL_TFLITE_PATH is set to <path/to/custom_model_after_vela.tflite>
+...
+-- User option img_class_LABELS_TXT_FILE is set to <path/to/labels_custom_model.txt>
+...
+-- Using <path/to/custom_model_after_vela.tflite>
+++ Converting custom_model_after_vela.tflite to\
+custom_model_after_vela.tflite.cc
+-- Generating labels file from <path/to/labels_custom_model.txt>
+-- writing to <path/to/build/generated/src/Labels.cc>
+...
+```
+
+After compiling, your custom model will have now replaced the default one in the application.
+
+## Setting-up and running Ethos-U55 code sample
+
+### Setting up the Ethos-U55 Fast Model
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+
+For Ethos-U55 evaluation, please download the MPS3 version of the Arm® Corstone™-300 model that contains Ethos-U55 and
+Cortex-M55. The model is currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+```commandline
+$./FVP_Corstone_SSE-300_Ethos-U55.sh
+```
+
+- Follow the instructions to install the FVP to your desired location
+
+### Starting Fast Model simulation
+
+Pre-built application binary ethos-u-img_class.axf can be found in the bin/mps3-sse-300 folder of the delivery package.
+Assuming the install location of the FVP was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-img_class.axf
+```
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started if `img_class_FILE_PATH` pointed to a single file (or a folder containing a single image)
+the inference starts immediately. In case of multiple inputs choice, it outputs a menu and waits for the user input from
+telnet terminal:
+
+```log
+User input required
+Enter option number from:
+
+1. Classify next image
+2. Classify image at chosen index
+3. Run classification on all images
+4. Show NN model info
+5. List images
+
+Choice:
+
+```
+
+1. “Classify next image” menu option will run single inference on the next in line image from the collection of the
+ compiled images.
+
+2. “Classify image at chosen index” menu option will run single inference on the chosen image.
+
+ > **Note:** Please make sure to select image index in the range of supplied images during application build.
+ By default, pre-built application has 4 images, indexes from 0 to 3.
+
+3. “Run classification on all images” menu option triggers sequential inference executions on all built-in images.
+
+4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes:
+
+ ```log
+ [INFO] uTFL version: 2.5.0
+ [INFO] Model info:
+ [INFO] Model INPUT tensors:
+ [INFO] tensor type is UINT8
+ [INFO] tensor occupies 150528 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 224
+ [INFO] 2: 224
+ [INFO] 3: 3
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.007812
+ [INFO] ZeroPoint[0] = 128
+ [INFO] Model OUTPUT tensors:
+ [INFO] tensor type is UINT8
+ [INFO] tensor occupies 1001 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 1001
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.098893
+ [INFO] ZeroPoint[0] = 58
+ [INFO] Activation buffer (a.k.a tensor arena) size used: 521760
+ [INFO] Number of operators: 1
+ [INFO] Operator 0: ethos-u
+ [INFO] Use of Arm uNPU is enabled
+ ```
+
+5. “List Images” menu option prints a list of pair image indexes - the original filenames embedded in the application:
+
+ ```log
+ [INFO] List of Files:
+ [INFO] 0 => cat.bmp
+ [INFO] 1 => dog.bmp
+ [INFO] 2 => kimono.bmp
+ [INFO] 3 => tiger.bmp
+ ```
+
+### Running Image Classification
+
+Please select the first menu option to execute Image Classification.
+
+The following example illustrates application output for classification:
+
+```log
+[INFO] Running inference on image 0 => cat.bmp
+[INFO] Profile for Inference:
+ Active NPU cycles: 7622641
+ Idle NPU cycles: 525
+
+[INFO] 0) 282 (14.636096) -> tabby, tabby cat
+[INFO] 1) 286 (14.537203) -> Egyptian cat
+[INFO] 2) 283 (12.757138) -> tiger cat
+[INFO] 3) 458 (7.021370) -> bow tie, bow-tie, bowtie
+[INFO] 4) 288 (7.021370) -> lynx, catamount
+```
+
+It could take several minutes to complete one inference run (average time is 2-3 minutes).
+
+The log shows the inference results for “image 0” (0 - index) that corresponds to “cat.bmp” in the sample image resource
+folder.
+
+The profiling section of the log shows that for this inference:
+
+- Ethos-U55's PMU report:
+
+ - 7,622,641 active cycles: number of NPU cycles that were used for computation
+
+ - 525 idle cycles: number of cycles for which the NPU was idle
+
+- For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as
+ the CPU model is not cycle-approximate or cycle-accurate.
+
+The application prints the top 5 classes with indexes, confidence score and labels from associated
+labels_mobilenet_v2_1.0_224.txt file. The FVP window also shows the output on its LCD section.
diff --git a/docs/use_cases/inference_runner.md b/docs/use_cases/inference_runner.md
new file mode 100644
index 0000000..ffb205e
--- /dev/null
+++ b/docs/use_cases/inference_runner.md
@@ -0,0 +1,296 @@
+# Inference Runner Code Sample
+
+- [Introduction](#introduction)
+ - [Prerequisites](#prerequisites)
+- [Building the Code Samples application from sources](#building-the-code-samples-application-from-sources)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Add custom model](#add-custom-model)
+- [Setting-up and running Ethos-U55 code sample](#setting-up-and-running-ethos-u55-code-sample)
+ - [Setting up the Ethos-U55 Fast Model](#setting-up-the-ethos-u55-fast-model)
+ - [Starting Fast Model simulation](#starting-fast-model-simulation)
+ - [Running Inference Runner](#running-inference-runner)
+- [Inference Runner processing information](#inference-runner-processing-information)
+
+## Introduction
+
+This document describes the process of setting up and running the Arm® Ethos™-U55 NPU Inference Runner.
+The inference runner is intended for quickly checking profiling results for any desired network, providing it has been
+processed by the Vela compiler.
+
+A simple model is provided with the Inference Runner as an example, but it is expected that the user will replace this
+model with one they wish to profile, see [Add custom model](./inference_runner.md#Add-custom-model) for more details.
+
+The inference runner is intended for quickly checking profiling results for any desired network
+providing it has been processed by the Vela compiler.
+
+The inference runner will populate all input tensors for the provided model with randomly generated data and an
+inference is then performed. Profiling results are then displayed in the console.
+
+Use case code could be found in [source/use_case/inference_runner](../../source/use_case/inference_runner]) directory.
+
+### Prerequisites
+
+See [Prerequisites](../documentation.md#prerequisites)
+
+## Building the Code Samples application from sources
+
+### Build options
+
+In addition to the already specified build option in the main documentation, the Inference Runner use case adds:
+
+- `inference_runner_MODEL_TFLITE_PATH` - Path to the NN model file in TFLite format. Model will be processed and
+ included into the application axf file. The default value points to one of the delivered set of models.
+ Note that the parameters `TARGET_PLATFORM` and `ETHOS_U55_ENABLED` should be aligned with the chosen model, i.e.:
+ - if `ETHOS_U55_ENABLED` is set to `On` or `1`, the NN model is assumed to be optimized. The model will naturally
+ all back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
+ - if `ETHOS_U55_ENABLED` is set to `Off` or `0`, the NN model is assumed to be unoptimized. Supplying an optimized model
+ in this case will result in a runtime error.
+
+- `inference_runner_ACTIVATION_BUF_SZ`: The intermediate/activation buffer size reserved for the NN model. By
+ default, it is set to 2MiB and should be enough for most models.
+
+In order to build **ONLY** Inference Runner example application add to the `cmake` command line specified in
+[Building](../documentation.md#Building) `-DUSE_CASE_BUILD=inferece_runner`.
+
+### Build process
+
+> **Note:** This section describes the process for configuring the build for `MPS3: SSE-300` for different target platform
+>see [Building](../documentation.md#Building) section.
+
+Create a build directory and navigate inside:
+
+```commandline
+mkdir build_inference_runner && cd build_inference_runner
+```
+
+On Linux, execute the following command to build **only** Inference Runner application to run on the Ethos-U55 Fast
+Model when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=inference_runner ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=inference_runner ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific file to set the compiler and platform specific
+parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DUSE_CASE_BUILD=inference_runner ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 \
+ -DUSE_CASE_BUILD=inference_runner ..
+```
+
+> **Note:** If building for different Ethos-U55 configurations, see
+>[Configuring build for different Arm Ethos-U55 configurations](../sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations):
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver is not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` can be used to configure the location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DUSE_CASE_BUILD=inference_runner ..
+```
+
+Also, `CMSIS_SRC_PATH` parameter can be used to override the CMSIS sources used for compilation used by TensorFlow by
+default. For example, to use the CMSIS sources fetched by the ethos-u helper script, we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=../ethos-u/core_software/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=../ethos-u/core_software/core_driver \
+ -DCMSIS_SRC_PATH=../ethos-u/core_software/cmsis \
+ -DUSE_CASE_BUILD=inference_runner ..
+```
+
+> **Note:** If re-building with changed parameters values, it is highly advised to clean the build directory and re-run
+>the CMake command.
+
+If the CMake command succeeded, build the application as follows:
+
+```commandline
+make -j4
+```
+
+For Windows, use `mingw32-make`.
+
+Add VERBOSE=1 to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder:
+
+```tree
+bin
+ ├── ethos-u-inference_runner.axf
+ ├── ethos-u-inference_runner.htm
+ ├── ethos-u-inference_runner.map
+ ├── images-inference_runner.txt
+ └── sectors
+ ├── kws
+ │ └── ...
+ └── img_class
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where:
+
+- `ethos-u-inference_runner.axf`: The built application binary for the Inference Runner use case.
+
+- `ethos-u-inference_runner.map`: Information from building the application (e.g. libraries used, what was optimized,
+ location of objects)
+
+- `ethos-u-inference_runner.htm`: Human readable file containing the call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files for loading into different FPGA memory regions.
+
+- `Images-inference_runner.txt`: Tells the FPGA which memory regions to use for loading the binaries in sectors/**
+ folder.
+
+### Add custom model
+
+The application performs inference using the model pointed to by the CMake parameter `inference_runner_MODEL_TFLITE_PATH`.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom model has been run through the Vela compiler
+>successfully before continuing. See [Optimize model with Vela compiler](../sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+Then, you must set `inference_runner_MODEL_TFLITE_PATH` to the location of the Vela processed model file.
+
+An example:
+
+```commandline
+cmake \
+ -Dinference_runner_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The `.tflite` model file pointed to by `inference_runner_MODEL_TFLITE_PATH` will be converted to C++ files during the CMake
+configuration stage and then compiled into the application for performing inference with.
+
+The log from the configuration stage should tell you what model path has been used:
+
+```stdout
+-- User option inference_runner_MODEL_TFLITE_PATH is set to <path/to/custom_model_after_vela.tflite>
+...
+-- Using <path/to/custom_model_after_vela.tflite>
+++ Converting custom_model_after_vela.tflite to\
+custom_model_after_vela.tflite.cc
+...
+```
+
+After compiling, your custom model will have now replaced the default one in the application.
+
+## Setting-up and running Ethos-U55 code sample
+
+### Setting up the Ethos-U55 Fast Model
+
+The FVP is available publicly from
+[Arm Ecosystem FVP downloads](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+
+For Ethos-U55 evaluation, please download the MPS3 version of the Arm® Corstone™-300 model that contains Ethos-U55 and
+Cortex-M55. The model is currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+```commandline
+./FVP_Corstone_SSE-300_Ethos-U55.sh
+```
+
+- Follow the instructions to install the FVP to your desired location
+
+### Starting Fast Model simulation
+
+Once completed the building step, application binary ethos-u-infernce_runner.axf can be found in the `build/bin` folder.
+Assuming the install location of the FVP was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-inference_runner.axf
+```
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+### Running Inference Runner
+
+After the application has started the inference starts immediately and it outputs the results on the telnet terminal.
+
+The following example illustrates application output:
+
+```log
+[INFO] Profile for Inference:
+ Active NPU cycles: 26976
+ Idle NPU cycles: 196
+```
+
+After running an inference on randomly generated data, the output of the log shows the profiling results that for this
+inference:
+
+- Ethos-U55's PMU report:
+
+ - 26,976 active cycles: number of cycles that were used for computation
+
+ - 196 idle cycles: number of cycles for which the NPU was idle
+
+- For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as
+ the CPU model is not cycle-approximate or cycle-accurate.
diff --git a/docs/use_cases/kws.md b/docs/use_cases/kws.md
new file mode 100644
index 0000000..316b501
--- /dev/null
+++ b/docs/use_cases/kws.md
@@ -0,0 +1,474 @@
+# Keyword Spotting Code Sample
+
+- [Introduction](#introduction)
+ - [Prerequisites](#prerequisites)
+- [Building the code sample application from sources](#building-the-code-sample-application-from-sources)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Add custom input](#add-custom-input)
+ - [Add custom model](#add-custom-model)
+- [Setting-up and running Ethos-U55 code sample](#setting-up-and-running-ethos-u55-code-sample)
+ - [Setting up the Ethos-U55 Fast Model](#setting-up-the-ethos-u55-fast-model)
+ - [Starting Fast Model simulation](#starting-fast-model-simulation)
+ - [Running Keyword Spotting](#running-keyword-spotting)
+- [Keyword Spotting processing information](#keyword-spotting-processing-information)
+ - [Preprocessing and feature extraction](#preprocessing-and-feature-extraction)
+ - [Postprocessing](#postprocessing)
+
+## Introduction
+
+This document describes the process of setting up and running the Arm® Ethos™-U55 Keyword Spotting
+example.
+
+Use case code could be found in [source/use_case/kws](../../source/use_case/kws]) directory.
+
+### Preprocessing and feature extraction
+
+The DS-CNN keyword spotting model that is supplied with the Code Samples expects audio data to be preprocessed in
+a specific way before performing an inference. This section aims to provide an overview of the feature extraction
+process used.
+
+First the audio data is normalized to the range (-1, 1).
+
+> **Note:** Mel-frequency cepstral coefficients (MFCCs) are a common feature extracted from audio data and can be used as
+>input for machine learning tasks like keyword spotting and speech recognition.
+>See source/application/main/include/Mfcc.hpp for implementation details.
+
+Next, a window of 640 audio samples is taken from the start of the audio clip. From these 640 samples we calculate 10
+MFCC features.
+
+The whole window is shifted to the right by 320 audio samples and 10 new MFCC features are calculated. This process of
+shifting and calculating is repeated until the end of the 16000 audio samples needed to perform an inference is reached.
+In total this will be 49 windows that each have 10 MFCC features calculated for them, giving an input tensor of shape
+49x10.
+
+These extracted features are quantized, and an inference is performed.
+
+![KWS preprocessing](../media/KWS_preprocessing.png)
+
+If the audio clip is longer than 16000 audio samples then the initial starting position is offset by 16000/2 = 8000
+audio samples. From this new starting point, MFCC features for the next 16000 audio samples are calculated and another
+inference is performed (i.e. do an inference for samples 8000-24000).
+
+> **Note:** Parameters of the MFCC feature extraction such as window size, stride, number of features etc. all depend on
+>what was used during model training. These values are specific to each model and if you try a different keyword spotting
+>model that uses MFCC input then values are likely to need changing to match the new model.
+In addition, MFCC feature extraction methods can vary slightly with different normalization methods or scaling etc. being used.
+
+### Postprocessing
+
+After an inference is complete the highest probability detected word is output to console, providing its probability is
+larger than a threshold value (default 0.9).
+
+If multiple inferences are performed for an audio clip, then multiple results will be output.
+
+### Prerequisites
+
+See [Prerequisites](../documentation.md#prerequisites)
+
+## Building the code sample application from sources
+
+### Build options
+
+In addition to the already specified build option in the main documentation, keyword spotting use case adds:
+
+- `kws_MODEL_TFLITE_PATH` - Path to the NN model file in TFLite format. Model will be processed and included into the application axf file. The default value points to one of the delivered set of models. Note that the parameters `kws_LABELS_TXT_FILE`,`TARGET_PLATFORM` and `ETHOS_U55_ENABLED` should be aligned with the chosen model, i.e.:
+ - if `ETHOS_U55_ENABLED` is set to `On` or `1`, the NN model is assumed to be optimized. The model will naturally fall back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
+ - if `ETHOS_U55_ENABLED` is set to `Off` or `0`, the NN model is assumed to be unoptimized. Supplying an optimized model in this case will result in a runtime error.
+
+- `kws_FILE_PATH`: Path to the directory containing audio files, or a path to single WAV file, to be used in the application. The default value points
+ to the resources/kws/samples folder containing the delivered set of audio clips.
+
+- `kws_LABELS_TXT_FILE`: Path to the labels' text file. The file is used to map key word class index to the text
+ label. The default value points to the delivered labels.txt file inside the delivery package.
+
+- `kws_AUDIO_RATE`: Input data sampling rate. Each audio file from kws_FILE_PATH is preprocessed during the build to
+ match NN model input requirements. Default value is 16000.
+
+- `kws_AUDIO_MONO`: If set to ON the audio data will be converted to mono. Default is ON.
+
+- `kws_AUDIO_OFFSET`: Start loading audio data starting from this offset (in seconds). Default value is 0.
+
+- `kws_AUDIO_DURATION`: Length of the audio data to be used in the application in seconds. Default is 0 meaning the
+ whole audio file will be taken.
+
+- `kws_AUDIO_MIN_SAMPLES`: Minimum number of samples required by the network model. If the audio clip is shorter than
+ this number, it is padded with zeros. Default value is 16000.
+
+- `kws_MODEL_SCORE_THRESHOLD`: Threshold value [0.0, 1.0] that must be applied to the inference results for a
+ label to be deemed valid. Default is 0.9
+
+- `kws_ACTIVATION_BUF_SZ`: The intermediate/activation buffer size reserved for the NN model. By default, it is set
+ to 1MiB and should be enough for most models.
+
+In order to build **ONLY** keyword spotting example application add to the `cmake` command line specified in [Building](../documentation.md#Building) `-DUSE_CASE_BUILD=kws`.
+
+### Build process
+
+> **Note:** This section describes the process for configuring the build for `MPS3: SSE-300` for different target platform see [Building](../documentation.md#Building) section.
+
+In order to build **only** the keyword spotting example, create a build directory and
+navigate inside, for example:
+
+```commandline
+mkdir build_kws && cd build_kws
+```
+
+On Linux, execute the following command to build Keyword Spotting application to run on the Ethos-U55 Fast Model when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific file to set the compiler and platform specific
+parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DUSE_CASE_BUILD=kws ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 \
+ -DUSE_CASE_BUILD=kws ..
+```
+
+> **Note:** If building for different Ethos-U55 configurations, see [Configuring build for different Arm Ethos-U55 configurations](../sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations):
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver is not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` can be used to configure the location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DUSE_CASE_BUILD=kws ..
+```
+
+Also, `CMSIS_SRC_PATH` parameter can be used to override the CMSIS sources used for compilation used by TensorFlow by default. For example, to use the CMSIS sources fetched by the ethos-u helper script, we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=../ethos-u/core_software/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=../ethos-u/core_software/core_driver \
+ -DCMSIS_SRC_PATH=../ethos-u/core_software/cmsis \
+ -DUSE_CASE_BUILD=kws ..
+```
+
+> **Note:** If re-building with changed parameters values, it is highly advised to clean the build directory and re-run the CMake command.
+
+If the CMake command succeeded, build the application as follows:
+
+```commandline
+make -j4
+```
+
+For Windows, use `mingw32-make`.
+
+Add VERBOSE=1 to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder:
+
+```tree
+bin
+ ├── ethos-u-kws.axf
+ ├── ethos-u-kws.htm
+ ├── ethos-u-kws.map
+ ├── images-kws.txt
+ └── sectors
+ └── kws
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where:
+
+- `ethos-u-kws.axf`: The built application binary for the Keyword Spotting use case.
+
+- `ethos-u-kws.map`: Information from building the application (e.g. libraries used, what was optimized, location of
+ objects)
+
+- `ethos-u-kws.htm`: Human readable file containing the call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files for loading into different FPGA memory regions.
+
+- `Images-kws.txt`: Tells the FPGA which memory regions to use for loading the binaries in sectors/\*\* folder.
+
+### Add custom input
+
+The application performs inference on audio data found in the folder, or an individual file, set by the CMake parameter `kws_FILE_PATH`.
+
+To run the application with your own audio clips first create a folder to hold them and then copy the custom audio files
+into this folder, for example:
+
+```commandline
+mkdir /tmp/custom_wavs
+
+cp my_clip.wav /tmp/custom_wavs/
+```
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+Next set `kws_FILE_PATH` to the location of this folder when building:
+
+```commandline
+cmake \
+ -Dkws_FILE_PATH=/tmp/custom_wavs/ \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DUSE_CASE_BUILD=kws \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+The audio clips found in the `kws_FILE_PATH` folder will be picked up and automatically converted to C++ files during the
+CMake configuration stage and then compiled into the application during the build phase for performing inference with.
+
+The log from the configuration stage should tell you what audio clip directory path has been used:
+
+```log
+-- User option kws_FILE_PATH is set to /tmp/custom_wavs
+-- Generating audio files from /tmp/custom_wavs
+++ Converting my_clip.wav to my_clip.cc
+++ Generating build/generated/kws/include/AudioClips.hpp
+++ Generating build/generated/kws/src/AudioClips.cc
+-- Defined build user options:
+-- kws_FILE_PATH=/tmp/custom_wavs
+```
+
+After compiling, your custom inputs will have now replaced the default ones in the application.
+
+> **Note:** The CMake parameter `kws_AUDIO_MIN_SAMPLES` determine the minimum number of input sample. When building the application,
+if the size of the audio clips is less then `kws_AUDIO_MIN_SAMPLES` then it will be padded so that it does.
+
+### Add custom model
+
+The application performs inference using the model pointed to by the CMake parameter `kws_MODEL_TFLITE_PATH`.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom model has been run through the Vela compiler successfully before continuing. See [Optimize model with Vela compiler](../sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+To run the application with a custom model you will need to provide a labels_<model_name>.txt file of labels
+associated with the model. Each line of the file should correspond to one of the outputs in your model. See the provided
+ds_cnn_labels.txt file for an example.
+
+Then, you must set kws_MODEL_TFLITE_PATH to the location of the Vela processed model file and kws_LABELS_TXT_FILE
+to the location of the associated labels file.
+
+An example:
+
+```commandline
+cmake \
+ -Dkws_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \
+ -Dkws_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DUSE_CASE_BUILD=kws \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The `.tflite` model file pointed to by `kws_MODEL_TFLITE_PATH` and labels text file pointed to by `kws_LABELS_TXT_FILE` will
+be converted to C++ files during the CMake configuration stage and then compiled into the application for performing
+inference with.
+
+The log from the configuration stage should tell you what model path and labels file have been used:
+
+```log
+-- User option kws_MODEL_TFLITE_PATH is set to <path/to/custom_model_after_vela.tflite>
+...
+-- User option kws_LABELS_TXT_FILE is set to <path/to/labels_custom_model.txt>
+...
+-- Using <path/to/custom_model_after_vela.tflite>
+++ Converting custom_model_after_vela.tflite to\
+custom_model_after_vela.tflite.cc
+-- Generating labels file from <path/to/labels_custom_model.txt>
+-- writing to <path/to/build/generated/src/Labels.cc>
+...
+```
+
+After compiling, your custom model will have now replaced the default one in the application.
+
+## Setting-up and running Ethos-U55 code sample
+
+### Setting up the Ethos-U55 Fast Model
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+
+For Ethos-U55 evaluation, please download the MPS3 version of the Arm® Corstone™-300 model that contains Ethos-U55 and
+Cortex-M55. The model is currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+```commandline
+./FVP_Corstone_SSE-300_Ethos-U55.sh
+```
+
+- Follow the instructions to install the FVP to your desired location
+
+### Starting Fast Model simulation
+
+Once completed the building step, application binary ethos-u-kws.axf can be found in the `build/bin` folder.
+Assuming the install location of the FVP was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-kws.axf
+```
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started if `kws_FILE_PATH` pointed to a single file (or a folder containing a single input file)
+the inference starts immediately. In case of multiple inputs choice, it outputs a menu and waits for the user input from telnet terminal:
+
+```log
+User input required
+Enter option number from:
+
+1. Classify next audio clip
+2. Classify audio clip at chosen index
+3. Run classification on all audio clips
+4. Show NN model info
+5. List audio clips
+
+Choice:
+
+```
+
+1. “Classify next audio clip” menu option will run inference on the next in line voice clip from the collection of the
+ compiled audio.
+
+ > **Note:** Note that if the clip is over a certain length, the application will invoke multiple inference runs to cover the entire file.
+
+2. “Classify audio clip at chosen index” menu option will run inference on the chosen audio clip.
+
+ > **Note:** Please make sure to select audio clip index in the range of supplied audio clips during application build.
+ By default, pre-built application has 4 files, indexes from 0 to 3.
+
+3. “Run classification on all audio clips” menu option triggers sequential inference executions on all built-in voice
+ samples.
+
+4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes:
+
+ ```log
+ [INFO] uTFL version: 2.5.0
+ [INFO] Model info:
+ [INFO] Model INPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 490 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 1
+ [INFO] 2: 49
+ [INFO] 3: 10
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 1.107164
+ [INFO] ZeroPoint[0] = 95
+ [INFO] Model OUTPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 12 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 12
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.003906
+ [INFO] ZeroPoint[0] = -128
+ [INFO] Activation buffer (a.k.a tensor arena) size used: 72848
+ [INFO] Number of operators: 1
+ [INFO] Operator 0: ethos-u
+ [INFO] Use of Arm uNPU is enabled
+ ```
+
+5. “List audio clips” menu option prints a list of pair audio indexes - the original filenames embedded in the
+ application:
+
+ ```log
+ [INFO] List of Files:
+ [INFO] 0 => down.wav
+ [INFO] 1 => rightleftup.wav
+ [INFO] 2 => yes.wav
+ [INFO] 3 => yesnogostop.wav
+ ```
+
+### Running Keyword Spotting
+
+Selecting the first option will run inference on the first file.
+
+The following example illustrates application output for classification:
+
+```log
+[INFO] Running inference on audio clip 0 => down.wav
+[INFO] Inference 1/1
+[INFO] Profile for Inference:
+ Active NPU cycles: 680400
+ Idle NPU cycles: 766
+
+[INFO] For timestamp: 0.000000 (inference #: 0); threshold: 0.900000
+[INFO] label @ 0: down, score: 0.996094
+```
+
+Each inference should take less than 30 seconds on most systems running Fast Model.
+The profiling section of the log shows that for this inference:
+
+- Ethos-U55's PMU report:
+
+ - 680,400 active cycles: number of cycles that were used for computation
+
+ - 766 idle cycles: number of cycles for which the NPU was idle
+
+- For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as
+ the CPU model is not cycle-approximate or cycle-accurate.
+
+The application prints the highest confidence score and the associated label from ds_cnn_labels.txt file. \ No newline at end of file
diff --git a/docs/use_cases/kws_asr.md b/docs/use_cases/kws_asr.md
new file mode 100644
index 0000000..e79b887
--- /dev/null
+++ b/docs/use_cases/kws_asr.md
@@ -0,0 +1,589 @@
+# Keyword Spotting and Automatic Speech Recognition Code Sample
+
+- [Introduction](#introduction)
+ - [Prerequisites](#prerequisites)
+- [Building the code sample application from sources](#building-the-code-sample-application-from-sources)
+ - [Build options](#build-options)
+ - [Build process](#build-process)
+ - [Add custom input](#add-custom-input)
+ - [Add custom model](#add-custom-model)
+- [Setting-up and running Ethos-U55 Code Samples](#setting-up-and-running-ethos-u55-code-samples)
+ - [Setting up the Ethos-U55 Fast Model](#setting-up-the-ethos-u55-fast-model)
+ - [Starting Fast Model simulation](#starting-fast-model-simulation)
+ - [Running Keyword Spotting and Automatic Speech Recognition](#running-keyword-spotting-and-automatic-speech-recognition)
+- [Keyword Spotting and Automatic Speech Recognition processing information](#keyword-spotting-and-automatic-speech-recognition-processing-information)
+ - [Preprocessing and feature extraction](#preprocessing-and-feature-extraction)
+ - [Keyword Spotting Preprocessing](#keyword-spotting-preprocessing)
+ - [Automatic Speech Recognition Preprocessing](#automatic-speech-recognition-preprocessing)
+ - [Postprocessing](#postprocessing)
+
+## Introduction
+
+This document describes the process of setting up and running an example of sequential execution of the Keyword Spotting
+and Automatic Speech Recognition models on Cortex-M CPU and Ethos-U NPU.
+
+The Keyword Spotting and Automatic Speech Recognition example demonstrates how to run multiple models sequentially. A
+Keyword Spotting model is first run on the CPU and if a set keyword is detected then an Automatic Speech Recognition
+model is run on Ethos-U55 on the remaining audio.
+Tensor arena memory region is reused between models to optimise application memory footprint.
+
+"Yes" key word is used to trigger full command recognition following the key word.
+Use case code could be found in [source/use_case/kws_asr](../../source/use_case/kws_asr]) directory.
+
+### Preprocessing and feature extraction
+
+In this use-case there are 2 different models being used with different requirements for preprocessing. As such each
+preprocessing process is detailed below. Note that Automatic Speech Recognition only occurs if a keyword is detected in
+the audio clip.
+
+By default the KWS model is run purely on CPU and not on the Ethos-U55.
+
+#### Keyword Spotting Preprocessing
+
+The DS-CNN keyword spotting model that is used with the Code Samples expects audio data to be preprocessed in
+a specific way before performing an inference. This section aims to provide an overview of the feature extraction
+process used.
+
+First the audio data is normalized to the range (-1, 1).
+
+> **Note:** Mel-frequency cepstral coefficients (MFCCs) are a common feature extracted from audio data and can be used as input for machine learning tasks like keyword spotting and speech recognition. See source/application/main/include/Mfcc.hpp for implementation details.
+
+Next, a window of 640 audio samples is taken from the start of the audio clip. From these 640 samples we calculate 10
+MFCC features.
+
+The whole window is shifted to the right by 320 audio samples and 10 new MFCC features are calculated. This process of
+shifting and calculating is repeated until the end of the 16000 audio samples needed to perform an inference is reached.
+In total this will be 49 windows that each have 10 MFCC features calculated for them, giving an input tensor of shape
+49x10.
+
+These extracted features are quantized, and an inference is performed.
+
+If the audio clip is longer than 16000 audio samples then the initial starting position is offset by 16000/2 = 8000
+audio samples. From this new starting point, MFCC features for the next 16000 audio samples are calculated and another
+inference is performed (i.e. do an inference for samples 8000-24000).
+
+> **Note:** Parameters of the MFCC feature extraction such as window size, stride, number of features etc. all depend on what was used during model training. These values are specific to each model and if you try a different keyword spotting model that uses MFCC input then values are likely to need changing to match the new model.
+
+In addition, MFCC feature extraction methods can vary slightly with different normalization methods or scaling etc. being used.
+
+#### Automatic Speech Recognition Preprocessing
+
+The wav2letter automatic speech recognition model that is used with the Code Samples expects audio data to be
+preprocessed in a specific way before performing an inference. This section aims to provide an overview of the feature
+extraction process used.
+
+First the audio data is normalized to the range (-1, 1).
+
+> **Note:** Mel-frequency cepstral coefficients (MFCCs) are a common feature extracted from audio data and can be used as input for machine learning tasks like keyword spotting and speech recognition. See source/application/main/include/Mfcc.hpp for implementation details.
+
+Next, a window of 512 audio samples is taken from the start of the audio clip. From these 512 samples we calculate 13
+MFCC features.
+
+The whole window is shifted to the right by 160 audio samples and 13 new MFCC features are calculated. This process of
+shifting and calculating is repeated until enough audio samples to perform an inference have been processed. In total
+this will be 296 windows that each have 13 MFCC features calculated for them.
+
+After extracting MFCC features the first and second order derivatives of these features with respect to time are
+calculated. These derivative features are then standardized and concatenated with the MFCC features (which also get
+standardized). At this point the input tensor will have a shape of 296x39.
+
+These extracted features are quantized, and an inference is performed.
+
+For longer audio clips where multiple inferences need to be performed, then the initial starting position is offset by
+(100\*160) = 16000 audio samples. From this new starting point, MFCC and derivative features are calculated as before
+until there is enough to perform another inference. Padding can be used if there are not enough audio samples for at
+least 1 inference. This step is repeated until the whole audio clip has been processed. If there are not enough audio
+samples for a final complete inference the MFCC features will be padded by repeating the last calculated feature until
+an inference can be performed.
+
+> **Note:** Parameters of the MFCC feature extraction such as window size, stride, number of features etc. all depend on what was used during model training. These values are specific to each model. If you switch to a different ASR model than the one supplied, then the feature extraction process could be completely different to the one currently implemented.
+
+The amount of audio samples we offset by for long audio clips is specific to the included wav2letter model.
+
+### Postprocessing
+
+If a keyword is detected then the ASR process is run and the raw output of that inference needs to be postprocessed to
+get a usable result.
+
+The raw output from the model is a tensor of shape 148x29 where each row is a probability distribution over the possible
+29 characters that can appear at each of the 148 time steps.
+
+This wav2letter model is trained using context windows, this means that only certain parts of the output are usable
+depending on the bit of the audio clip that is currently being processed.
+
+If this is the first inference and multiple inferences are required, then ignore the final 49 rows of the output.
+Similarly, if this is the final inference from multiple inferences then ignore the first 49 rows of the output. Finally,
+if this inference is not the last or first inference then ignore the first and last 49 rows of the model output.
+
+> **Note:** If the audio clip is small enough then the whole of the model output is usable and there is no need to throw away any of the output before continuing.
+
+Once any rows have been removed the final processing can be done. To process the output, first the letter with the
+highest probability at each time step is found. Next, any letters that are repeated multiple times in a row are removed
+(e.g. [t, t, t, o, p, p] becomes [t, o, p]). Finally, the 29^th^ blank token letter is removed from the output.
+
+For the final output, the result from all inferences are combined before decoding. What you are left with is then
+displayed to the console.
+
+### Prerequisites
+
+See [Prerequisites](../documentation.md#prerequisites)
+
+## Building the code sample application from sources
+
+### Build options
+
+In addition to the already specified build option in the main documentation, Keyword Spotting and Automatic Speech
+Recognition use case adds:
+
+- `kws_asr_MODEL_TFLITE_PATH_ASR` and `kws_asr_MODEL_TFLITE_PATH_KWS`: Path to the NN model files in TFLite format.
+ Models will be processed and included into the application axf file. The default value points to one of the delivered set of models.
+ Note that the parameters `kws_asr_LABELS_TXT_FILE_KWS`, `kws_asr_LABELS_TXT_FILE_ASR`,`TARGET_PLATFORM` and `ETHOS_U55_ENABLED`
+ should be aligned with the chosen model, i.e:
+ - if `ETHOS_U55_ENABLED` is set to `On` or `1`, the NN model is assumed to be optimized. The model will naturally fall back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
+ - if `ETHOS_U55_ENABLED` is set to `Off` or `0`, the NN model is assumed to be unoptimized. Supplying an optimized model in this case will result in a runtime error.
+
+- `kws_asr_FILE_PATH`: Path to the directory containing audio files, or a path to single WAV file, to be used in the application. The default value
+ points to the resources/kws_asr/samples folder containing the delivered set of audio clips.
+
+- `kws_asr_LABELS_TXT_FILE_KWS` and `kws_asr_LABELS_TXT_FILE_ASR`: Path respectively to keyword spotting labels' and the automatic speech
+ recognition labels' text files. The file is used to map
+ letter class index to the text label. The default value points to the delivered labels.txt file inside the delivery
+ package.
+
+- `kws_asr_AUDIO_RATE`: Input data sampling rate. Each audio file from kws_asr_FILE_PATH is preprocessed during the
+ build to match NN model input requirements. Default value is 16000.
+
+- `kws_asr_AUDIO_MONO`: If set to ON the audio data will be converted to mono. Default is ON.
+
+- `kws_asr_AUDIO_OFFSET`: Start loading audio data starting from this offset (in seconds). Default value is 0.
+
+- `kws_asr_AUDIO_DURATION`: Length of the audio data to be used in the application in seconds. Default is 0 meaning
+ the whole audio file will be taken.
+
+- `kws_asr_AUDIO_MIN_SAMPLES`: Minimum number of samples required by the network model. If the audio clip is shorter
+ than this number, it is padded with zeros. Default value is 16000.
+
+- `kws_asr_MODEL_SCORE_THRESHOLD_KWS`: Threshold value that must be applied to the keyword spotting inference
+ results for a label to be deemed valid. Default is 0.9.
+
+- `kws_asr_MODEL_SCORE_THRESHOLD_ASR`: Threshold value that must be applied to the automatic speech recognition
+ inference results for a label to be deemed valid. Default is 0.5.
+
+- `kws_asr_ACTIVATION_BUF_SZ`: The intermediate/activation buffer size reserved for the NN model. By default, it is
+ set to 2MiB and should be enough for most models.
+
+In order to build **ONLY** Keyword Spotting and Automatic Speech
+Recognition example application add to the `cmake` command line specified in [Building](../documentation.md#Building) `-DUSE_CASE_BUILD=kws_asr`.
+
+### Build process
+
+> **Note:** This section describes the process for configuring the build for `MPS3: SSE-300` for different target platform see [Building](../documentation.md#Building).
+
+Create a build directory and navigate inside:
+
+```commandline
+mkdir build_kws_asr && cd build_kws_asr
+```
+
+On Linux, execute the following command to build the application to run on the Ethos-U55 Fast Model when providing only the mandatory arguments for CMake configuration:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+For Windows, add `-G "MinGW Makefiles"`:
+
+```commandline
+cmake \
+ -G "MinGW Makefiles" \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific file to set the compiler and platform specific
+parameters.
+
+To configure a build that can be debugged using Arm-DS, we can just specify
+the build type as `Debug`:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+To configure a build that can be debugged using a tool that only supports
+DWARF format 3 (Modeldebugger for example), we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DCMAKE_BUILD_TYPE=Debug \
+ -DARMCLANG_DEBUG_DWARF_LEVEL=3 \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+> **Note:** If building for different Ethos-U55 configurations, see [Configuring build for different Arm Ethos-U55 configurations](../sections/building.md#Configuring-build-for-different-Arm-Ethos-U55-configurations):
+
+If the TensorFlow source tree is not in its default expected location,
+set the path using `TENSORFLOW_SRC_PATH`.
+Similarly, if the Ethos-U55 driver is not in the default location,
+`ETHOS_U55_DRIVER_SRC_PATH` can be used to configure the location. For example:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+Also, `CMSIS_SRC_PATH` parameter can be used to override the CMSIS sources used for compilation used by TensorFlow by default. For example, to use the CMSIS sources fetched by the ethos-u helper script, we can use:
+
+```commandline
+cmake \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DTENSORFLOW_SRC_PATH=../ethos-u/core_software/tensorflow \
+ -DETHOS_U55_DRIVER_SRC_PATH=../ethos-u/core_software/core_driver \
+ -DCMSIS_SRC_PATH=../ethos-u/core_software/cmsis \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+> **Note:** If re-building with changed parameters values, it is highly advised to clean the build directory and re-run the CMake command.
+
+If the CMake command succeeded, build the application as follows:
+
+```commandline
+make -j4
+```
+
+For Windows, use `mingw32-make`.
+
+Add VERBOSE=1 to see compilation and link details.
+
+Results of the build will be placed under `build/bin` folder:
+
+```tree
+bin
+ ├── ethos-u-kws_asr.axf
+ ├── ethos-u-kws_asr.htm
+ ├── ethos-u-kws_asr.map
+ ├── images-kws_asr.txt
+ └── sectors
+ └── kws_asr
+ ├── dram.bin
+ └── itcm.bin
+```
+
+Where:
+
+- `ethos-u-kws_asr.axf`: The built application binary for the Keyword Spotting and Automatic Speech Recognition use
+ case.
+
+- `ethos-u-kws_asr.map`: Information from building the application (e.g. libraries used, what was optimized, location
+ of objects)
+
+- `ethos-u-kws_asr.htm`: Human readable file containing the call graph of application functions.
+
+- `sectors/`: Folder containing the built application, split into files for loading into different FPGA memory regions.
+
+- `Images-kws_asr.txt`: Tells the FPGA which memory regions to use for loading the binaries in sectors/** folder.
+
+### Add custom input
+
+The application performs inference on data found in the folder set by the CMake parameter `kws_asr_FILE_PATH`.
+
+To run the application with your own audio clips first create a folder to hold them and then copy the custom files into
+this folder:
+
+```commandline
+mkdir /tmp/custom_files
+
+cp custom_audio1.wav /tmp/custom_files/
+```
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+Next set `kws_asr_FILE_PATH` to the location of this folder when building:
+
+```commandline
+cmake \
+ -Dkws_asr_FILE_PATH=/tmp/custom_files/ \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws_asr- ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+The files found in the `kws_asr_FILE_PATH` folder will be picked up and automatically converted to C++ files during the
+CMake configuration stage and then compiled into the application during the build phase for performing inference with.
+
+The log from the configuration stage should tell you what directory path has been used:
+
+```log
+-- User option kws_asr_FILE_PATH is set to /tmp/custom_files
+```
+
+After compiling, your custom inputs will have now replaced the default ones in the application.
+
+### Add custom model
+
+The application performs KWS inference using the model pointed to by the CMake parameter `kws_asr_MODEL_TFLITE_PATH_KWS` and
+ASR inference using the model pointed to by the CMake parameter `kws_asr_MODEL_TFLITE_PATH_ASR`.
+
+This section assumes you wish to change the existing ASR model to a custom one. If instead you wish to change the KWS
+model then the instructions will be the same except ASR will change to KWS.
+
+> **Note:** If you want to run the model using Ethos-U55, ensure your custom model has been run through the Vela compiler successfully before continuing. See [Optimize model with Vela compiler](../sections/building.md#Optimize-custom-model-with-Vela-compiler).
+
+To run the application with a custom model you will need to provide a labels_<model_name>.txt file of labels
+associated with the model. Each line of the file should correspond to one of the outputs in your model. See the provided
+labels_wav2letter.txt file for an example.
+
+Then, you must set `kws_asr_MODEL_TFLITE_PATH_ASR` to the location of the Vela processed model file and
+`kws_asr_LABELS_TXT_FILE_ASR` to the location of the associated labels file.
+
+An example:
+
+```commandline
+cmake \
+ -Dkws_asr_MODEL_TFLITE_PATH_ASR=<path/to/custom_asr_model_after_vela.tflite> \
+ -Dkws_asr_LABELS_TXT_FILE_ASR=<path/to/labels_custom_model.txt> \
+ -DTARGET_PLATFORM=mps3 \
+ -DTARGET_SUBSYSTEM=sse-300 \
+ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \
+ -DUSE_CASE_BUILD=kws_asr ..
+```
+
+For Windows, add `-G "MinGW Makefiles"` to the CMake command.
+
+> **Note:** Clean the build directory before re-running the CMake command.
+
+The `.tflite` model files pointed to by `kws_asr_MODEL_TFLITE_PATH_KWS` and `kws_asr_MODEL_TFLITE_PATH_ASR`, labels text files pointed to by `kws_asr_LABELS_TXT_FILE_KWS` and `kws_asr_LABELS_TXT_FILE_ASR`
+will be converted to C++ files during the CMake configuration stage and then compiled into the application for
+performing inference with.
+
+The log from the configuration stage should tell you what model path and labels file have been used:
+
+```log
+-- User option TARGET_PLATFORM is set to mps3
+-- User option kws_asr_MODEL_TFLITE_PATH_ASR is set to <path/to/custom_asr_model_after_vela.tflite>
+...
+-- User option kws_asr_LABELS_TXT_FILE_ASR is set to <path/to/labels_custom_model.txt>
+...
+-- Using <path/to/custom_asr_model_after_vela.tflite>
+++ Converting custom_asr_model_after_vela.tflite to\
+custom_asr_model_after_vela.tflite.cc
+-- Generating labels file from <path/to/labels_custom_model.txt>
+-- writing to Labels_wav2letter
+...
+```
+
+After compiling, your custom model will have now replaced the default one in the application.
+
+## Setting-up and running Ethos-U55 Code Samples
+
+### Setting up the Ethos-U55 Fast Model
+
+The FVP is available publicly from [Arm Ecosystem FVP downloads](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).
+
+For Ethos-U55 evaluation, please download the MPS3 version of the Arm® Corstone™-300 model that contains Ethos-U55 and
+Cortex-M55. The model is currently only supported on Linux based machines. To install the FVP:
+
+- Unpack the archive
+
+- Run the install script in the extracted package
+
+```commandline
+./FVP_Corstone_SSE-300_Ethos-U55.sh
+```
+
+- Follow the instructions to install the FVP to your desired location
+
+### Starting Fast Model simulation
+
+Once completed the building step, application binary ethos-u-kws_asr.axf can be found in the `build/bin` folder.
+Assuming the install location of the FVP was set to ~/FVP_install_location, the simulation can be started by:
+
+```commandline
+$ ~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55
+./bin/mps3-sse-300/ethos-u-kws_asr.axf
+```
+
+A log output should appear on the terminal:
+
+```log
+telnetterminal0: Listening for serial connection on port 5000
+telnetterminal1: Listening for serial connection on port 5001
+telnetterminal2: Listening for serial connection on port 5002
+telnetterminal5: Listening for serial connection on port 5003
+```
+
+This will also launch a telnet window with the sample application's standard output and error log entries containing
+information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as
+the input and output tensor sizes of the model compiled into the executable binary.
+
+After the application has started if `kws_asr_FILE_PATH` pointed to a single file (or a folder containing a single input file)
+the inference starts immediately. In case of multiple inputs choice, it outputs a menu and waits for the user input from telnet terminal:
+
+```log
+User input required
+Enter option number from:
+
+1. Classify next audio clip
+2. Classify audio clip at chosen index
+3. Run classification on all audio clips
+4. Show NN model info
+5. List audio clips
+
+Choice:
+
+```
+
+1. “Classify next audio clip” menu option will run single inference on the next included file.
+
+2. “Classify audio clip at chosen index” menu option will run inference on the chosen audio clip.
+
+ > **Note:** Please make sure to select audio clip index in the range of supplied audio clips during application build.
+
+3. “Run ... on all” menu option triggers sequential inference executions on all built-in files.
+
+4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes:
+
+ ```log
+ [INFO] uTFL version: 2.5.0
+ [INFO] Model INPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 490 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 1
+ [INFO] 2: 49
+ [INFO] 3: 10
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 1.107164
+ [INFO] ZeroPoint[0] = 95
+ [INFO] Model OUTPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 12 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 12
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.003906
+ [INFO] ZeroPoint[0] = -128
+ [INFO] Activation buffer (a.k.a tensor arena) size used: 123616
+ [INFO] Number of operators: 16
+ [INFO] Operator 0: RESHAPE
+ [INFO] Operator 1: CONV_2D
+ [INFO] Operator 2: DEPTHWISE_CONV_2D
+ [INFO] Operator 3: CONV_2D
+ [INFO] Operator 4: DEPTHWISE_CONV_2D
+ [INFO] Operator 5: CONV_2D
+ [INFO] Operator 6: DEPTHWISE_CONV_2D
+ [INFO] Operator 7: CONV_2D
+ [INFO] Operator 8: DEPTHWISE_CONV_2D
+ [INFO] Operator 9: CONV_2D
+ [INFO] Operator 10: DEPTHWISE_CONV_2D
+ [INFO] Operator 11: CONV_2D
+ [INFO] Operator 12: AVERAGE_POOL_2D
+ [INFO] Operator 13: RESHAPE
+ [INFO] Operator 14: FULLY_CONNECTED
+ [INFO] Operator 15: SOFTMAX
+ [INFO] Model INPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 11544 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 296
+ [INFO] 2: 39
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.110316
+ [INFO] ZeroPoint[0] = -11
+ [INFO] Model OUTPUT tensors:
+ [INFO] tensor type is INT8
+ [INFO] tensor occupies 4292 bytes with dimensions
+ [INFO] 0: 1
+ [INFO] 1: 1
+ [INFO] 2: 148
+ [INFO] 3: 29
+ [INFO] Quant dimension: 0
+ [INFO] Scale[0] = 0.003906
+ [INFO] ZeroPoint[0] = -128
+ [INFO] Activation buffer (a.k.a tensor arena) size used: 809808
+ [INFO] Number of operators: 1
+ [INFO] Operator 0: ethos-u
+ ```
+
+5. “List” menu option prints a list of pair ... indexes - the original filenames embedded in the application:
+
+ ```log
+ [INFO] List of Files:
+ [INFO] 0 => yesnogostop.wav
+ ```
+
+### Running Keyword Spotting and Automatic Speech Recognition
+
+Please select the first menu option to execute Keyword Spotting and Automatic Speech Recognition.
+
+The following example illustrates application output:
+
+```log
+[INFO] KWS audio data window size 16000
+[INFO] Running KWS inference on audio clip 0 => yesnogostop.wav
+[INFO] Inference 1/7
+[INFO] Profile for Inference:
+ Active NPU cycles: 0
+ Idle NPU cycles: 6
+
+[INFO] For timestamp: 0.000000 (inference #: 0); threshold: 0.900000
+[INFO] label @ 0: yes, score: 0.996094
+[INFO] Keyword spotted
+[INFO] Inference 1/2
+[INFO] Profile for Inference:
+ Active NPU cycles: 28924742
+ Idle NPU cycles: 424
+
+[INFO] Inference 2/2
+[INFO] Profile for Inference:
+ Active NPU cycles: 28924740
+ Idle NPU cycles: 426
+
+[INFO] Result for inf 0: no gow
+[INFO] Result for inf 1: stoppe
+[INFO] Final result: no gow stoppe
+```
+
+It could take several minutes to complete one inference run (average time is 2-3 minutes).
+
+Using the input “yesnogostop.wav”, the log shows inference results for the KWS operation first, detecting the
+trigger word “yes“ with the stated probability score (in this case 0.99). After this, the ASR inference is run,
+printing the words recognized from the input sample.
+
+The profiling section of the log shows that for the ASR inference:
+
+- Ethos-U55's PMU report:
+
+ - 28,924,740 active cycles: number of cycles that were used for computation
+
+ - 426 idle cycles: number of cycles for which the NPU was idle
+
+- For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as
+the CPU model is not cycle-approximate or cycle-accurate.
+
+ Note that in this example the KWS inference does not use the Ethos-U55 and is run purely on CPU, therefore 0 Active
+ NPU cycles is shown.