diff options
Diffstat (limited to 'docs/sections')
-rw-r--r-- | docs/sections/appendix.md | 20 | ||||
-rw-r--r-- | docs/sections/building.md | 1023 | ||||
-rw-r--r-- | docs/sections/coding_guidelines.md | 323 | ||||
-rw-r--r-- | docs/sections/customizing.md | 731 | ||||
-rw-r--r-- | docs/sections/deployment.md | 281 | ||||
-rw-r--r-- | docs/sections/run.md | 42 | ||||
-rw-r--r-- | docs/sections/testing_benchmarking.md | 87 | ||||
-rw-r--r-- | docs/sections/troubleshooting.md | 27 |
8 files changed, 2534 insertions, 0 deletions
diff --git a/docs/sections/appendix.md b/docs/sections/appendix.md new file mode 100644 index 0000000..7b56faa --- /dev/null +++ b/docs/sections/appendix.md @@ -0,0 +1,20 @@ +# Appendix + +## Arm® Cortex®-M55 Memory map overview for Corstone™-300 reference design + +The table below is the memory mapping information specific to the Arm® Cortex®-M55. + +| Name | Base address | Limit address | Size | IDAU | Remarks | +|-------|--------------|---------------|-----------|------|-----------------------------------------------------------| +| ITCM | 0x0000_0000 | 0x0007_FFFF | 512 kiB | NS | ITCM code region | +| BRAM | 0x0100_0000 | 0x0120_0000 | 2 MiB | NS | FPGA data SRAM region | +| DTCM | 0x2000_0000 | 0x2007_FFFF | 512 kiB | NS | 4 banks for 128 kiB each | +| SRAM | 0x2100_0000 | 0x213F_FFFF | 4 MiB | NS | 2 banks of 2 MiB each as SSE-300 internal SRAM region | +| DDR | 0x6000_0000 | 0x6FFF_FFFF | 256 MiB | NS | DDR memory region | +| ITCM | 0x1000_0000 | 0x1007_FFFF | 512 kiB | S | ITCM code region | +| BRAM | 0x1100_0000 | 0x1120_0000 | 2 MiB | S | FPGA data SRAM region | +| DTCM | 0x3000_0000 | 0x3007_FFFF | 512 kiB | S | 4 banks for 128 kiB each | +| SRAM | 0x3100_0000 | 0x313F_FFFF | 4 MiB | S | 2 banks of 2 MiB each as SSE-300 internal SRAM region | +| DDR | 0x7000_0000 | 0x7FFF_FFFF | 256 MiB | S | DDR memory region | + +Default memory map can be found here: https://developer.arm.com/documentation/101051/0002/Memory-model/Memory-map
\ No newline at end of file diff --git a/docs/sections/building.md b/docs/sections/building.md new file mode 100644 index 0000000..56771b8 --- /dev/null +++ b/docs/sections/building.md @@ -0,0 +1,1023 @@ +# Building the Code Samples application from sources + +## Contents + +- [Building the Code Samples application from sources](#building-the-code-samples-application-from-sources) + - [Contents](#contents) + - [Build prerequisites](#build-prerequisites) + - [Build options](#build-options) + - [Build process](#build-process) + - [Preparing build environment](#preparing-build-environment) + - [Create a build directory](#create-a-build-directory) + - [Configuring the build for `MPS3: SSE-300`](#configuring-the-build-for-mps3-sse-300) + - [Configuring the build for `MPS3: SSE-200`](#configuring-the-build-for-mps3-sse-200) + - [Configuring the build native unit-test](#configuring-the-build-native-unit-test) + - [Configuring the build for `simple_platform`](#configuring-the-build-for-simple_platform) + - [Building the configured project](#building-the-configured-project) + - [Building timing adapter with custom options](#building-timing-adapter-with-custom-options) + - [Add custom inputs](#add-custom-inputs) + - [Add custom model](#add-custom-model) + - [Optimize custom model with Vela compiler](#optimize-custom-model-with-vela-compiler) + - [Memory constraints](#memory-constraints) + - [Automatic file generation](#automatic-file-generation) + +This section assumes the use of an **x86 Linux** build machine. + +## Build prerequisites + +Before proceeding, please, make sure that the following prerequisites +are fulfilled: + +- Arm Compiler version 6.14 or above is installed and available on the + path. + + Test the compiler by running: + + ```commandline + armclang -v + ``` + + ```log + Product: ARM Compiler 6.14 Professional + Component: ARM Compiler 6.14 + ``` + + > **Note:** Add compiler to the path, if needed: + > + > `export PATH=/path/to/armclang/bin:$PATH` + +- Compiler license is configured correctly + +- CMake version 3.15 or above is installed and available on the path. + Test CMake by running: + + ```commandline + cmake --version + ``` + + ```log + cmake version 3.16.2 + ``` + + > **Note:** Add cmake to the path, if needed: + > + > `export PATH=/path/to/cmake/bin:$PATH` + +- Python 3.6 or above is installed. Test python version by running: + + ```commandline + python3 --version + ``` + + ```log + Python 3.6.8 + ``` + +- Build system will create python virtual environment during the build + process. Please make sure that python virtual environment module is + installed: + + ```commandline + python3 -m venv + ``` + +- Make or MinGW make For Windows + + ```commandline + make --version + ``` + + ```log + GNU Make 4.1 + + ... + ``` + + > **Note:** Add it to the path environment variable, if needed. + +- Access to the Internet to download the third party dependencies, specifically: TensorFlow Lite Micro, Arm Ethos-U55 +driver and CMSIS. Instructions for downloading these are listed under [preparing build environment](#preparing-build-environment). + +## Build options + +The project build system allows user to specify custom NN +model (in `.tflite` format) or images and compile application binary from +sources. + +The build system uses pre-built TensorFlow Lite for Microcontrollers +library and Arm® Ethos™-U55 driver libraries from the delivery package. + +The build script is parameterized to support different options. Default +values for build parameters will build the executable compatible with +the Ethos-U55 Fast Model. + +The build parameters are: + +- `TARGET_PLATFORM`: Target platform to execute application: + - `mps3` + - `native` + - `simple_plaform` + +- `TARGET_SUBSYSTEM`: Platform target subsystem; this specifies the + design implementation for the deployment target. For both, the MPS3 + FVP and the MPS3 FPGA, this should be left to the default value of + SSE-300: + - `sse-300` (default - [Arm® Corstone™-300](https://developer.arm.com/ip-products/subsystem/corstone/corstone-300)) + - `sse-200` + +- `TENSORFLOW_SRC_PATH`: Path to the root of the TensorFlow directory. + The default value points to the TensorFlow submodule in the + [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder. + +- `ETHOS_U55_DRIVER_SRC_PATH`: Path to the Ethos-U55 core driver sources. + The default value points to the core_driver submodule in the + [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder. + +- `CMSIS_SRC_PATH`: Path to the CMSIS sources to be used to build TensorFlow + Lite Micro library. This parameters is optional and valid only for + Arm® Cortex®-M CPU targeted configurations. The default value points to the CMSIS submodule in the + [ethos-u](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) `dependencies` folder. + +- `ETHOS_U55_ENABLED`: Sets whether the use of Ethos-U55 is available for + the deployment target. By default, this is set and therefore + application is built with Ethos-U55 supported. + +- `CPU_PROFILE_ENABLED`: Sets whether profiling information for the CPU + core should be displayed. By default, this is set to false, but can + be turned on for FPGA targets. The the FVP, the CPU core's cycle + counts are not meaningful and should not be used. + +- `LOG_LEVEL`: Sets the verbosity level for the application's output + over UART/stdout. Valid values are `LOG_LEVEL_TRACE`, `LOG_LEVEL_DEBUG`, + `LOG_LEVEL_INFO`, `LOG_LEVEL_WARN` and `LOG_LEVEL_ERROR`. By default, it + is set to `LOG_LEVEL_INFO`. + +- `<use_case>_MODEL_TFLITE_PATH`: Path to the model file that will be + processed and included into the application axf file. The default + value points to one of the delivered set of models. Make sure the + model chosen is aligned with the `ETHOS_U55_ENABLED` setting. + + - When using Ethos-U55 backend, the NN model is assumed to be + optimized by Vela compiler. + However, even if not, it will fall back on the CPU and execute, + if supported by TensorFlow Lite Micro. + + - When use of Ethos-U55 is disabled, and if a Vela optimized model + is provided, the application will report a failure at runtime. + +- `USE_CASE_BUILD`: specifies the list of applications to build. By + default, the build system scans sources to identify available ML + applications and produces executables for all detected use-cases. + This parameter can accept single value, for example, + `USE_CASE_BUILD=img_class` or multiple values, for example, + `USE_CASE_BUILD="img_class;kws"`. + +- `ETHOS_U55_TIMING_ADAPTER_SRC_PATH`: Path to timing adapter sources. + The default value points to the `timing_adapter` dependencies folder. + +- `TA_CONFIG_FILE`: Path to the CMake configuration file containing the + timing adapter parameters. Used only if the timing adapter build is + enabled. + +- `TENSORFLOW_LITE_MICRO_CLEAN_BUILD`: Optional parameter to enable/disable + "cleaning" prior to building for the TensorFlow Lite Micro library. + It is enabled by default. + +- `TENSORFLOW_LITE_MICRO_CLEAN_DOWNLOADS`: Optional parameter to enable wiping + out TPIP downloads from TensorFlow source tree prior to each build. + It is disabled by default. + +- `ARMCLANG_DEBUG_DWARF_LEVEL`: When the CMake build type is specified as `Debug` + and when armclang toolchain is being used to build for a Cortex-M CPU target, + this optional argument can be set to specify the DWARF format. + By default, this is set to 4 and is synonymous with passing `-g` + flag to the compiler. This is compatible with Arm-DS and other tools + which can interpret the latest DWARF format. To allow debugging using + the Model Debugger from Arm FastModel Tools Suite, this argument can be used + to pass DWARF format version as "3". Note: this option is only available + when CMake project is configured with `-DCMAKE_BUILD_TYPE=Debug` argument. + Also, the same dwarf format is used for building TensorFlow Lite Micro library. + +> **Note:** For details on the specific use case build options, follow the +> instructions in the use-case specific documentation. +> Also, when setting any of the CMake configuration parameters that expect a directory/file path , it is advised +>to **use absolute paths instead of relative paths**. + +## Build process + +The build process can summarized in three major steps: + +- Prepare the build environment by downloading third party sources required, see +[Preparing build environment](#preparing-build-environment). + +- Configure the build for the platform chosen. +This stage includes: + - CMake options configuration + - When `<use_case>_MODEL_TFLITE_PATH` build options aren't provided, defaults neural network models are be downloaded +from [Arm ML-Zoo](https://github.com/ARM-software/ML-zoo/). In case of native build, network's input and output data +for tests are downloaded. + - Some files such as neural network models, network's inputs and output labels are automatically converted + into C/C++ arrays, see [Automatic file generation](#automatic-file-generation). + +- Build the application.\ +During this stage application and third party libraries are built see [Building the configured project](#building-the-configured-project). + +### Preparing build environment + +Certain third party sources are required to be present on the development machine to allow the example sources in this +repository to link against. + +1. [TensorFlow Lite Micro repository](https://github.com/tensorflow/tensorflow) +2. [Ethos-U55 core driver repository](https://review.mlplatform.org/admin/repos/ml/ethos-u/ethos-u-core-driver) +3. [CMSIS-5](https://github.com/ARM-software/CMSIS_5.git) + +These are part of the [ethos-u repository](https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/) and set as +submodules of this project. + +To pull the submodules: + +```sh +git submodule update --init +``` + +This will download all the required components and place them in a tree like: + +```tree +dependencies + └── ethos-u + ├── cmsis + ├── core_driver + ├── tensorflow + └── ... +``` + +> **NOTE**: The default source paths for the TPIP sources assume the above directory structure, but all of the relevant +>paths can be overridden by CMake configuration arguments `TENSORFLOW_SRC_PATH`, `ETHOS_U55_DRIVER_SRC_PATH`, +>and `CMSIS_SRC_PATH`. + +### Create a build directory + +Create a build directory in the root of the project and navigate inside: + +```commandline +mkdir build && cd build +``` + +### Configuring the build for `MPS3: SSE-300` + +On Linux, execute the following command to build the application to run +on the Ethos-U55 when providing only the mandatory arguments for CMake configuration: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake .. +``` + +For Windows, add `-G "MinGW Makefiles"`: + +```commandline +cmake \ + -G "MinGW Makefiles" \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake .. +``` + +Toolchain option `CMAKE_TOOLCHAIN_FILE` points to the toolchain specific +file to set the compiler and platform specific parameters. + +To configure a build that can be debugged using Arm-DS, we can just specify +the build type as `Debug`: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \ + -DCMAKE_BUILD_TYPE=Debug .. +``` + +To configure a build that can be debugged using a tool that only supports +DWARF format 3 (Modeldebugger for example), we can use: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \ + -DCMAKE_BUILD_TYPE=Debug \ + -DARMCLANG_DEBUG_DWARF_LEVEL=3 .. +``` + +If the TensorFlow source tree is not in its default expected location, +set the path using `TENSORFLOW_SRC_PATH`. +Similarly, if the Ethos-U55 driver and CMSIS are not in the default location, +`ETHOS_U55_DRIVER_SRC_PATH` and `CMSIS_SRC_PATH` can be used to configure their location. For example: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \ + -DTENSORFLOW_SRC_PATH=/my/custom/location/tensorflow \ + -DETHOS_U55_DRIVER_SRC_PATH=/my/custom/location/core_driver \ + -DCMSIS_SRC_PATH=/my/custom/location/cmsis .. +``` + +> **Note:** If re-building with changed parameters values, it is +highly advised to clean the build directory and re-run the CMake command. + +### Configuring the build for `MPS3: SSE-200` + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-200 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake .. +``` + +for Windows add `-G "MinGW Makefiles"`: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-200 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \ + -G "MinGW Makefiles .. +``` + +### Configuring the build native unit-test + +```commandline +cmake \ + -DTARGET_PLATFORM=native \ + -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake .. +``` + +For Windows add `-G "MinGW Makefiles"`: + +```commandline +cmake \ + -DTARGET_PLATFORM=native \ + -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/native-toolchain.cmake \ + -G "MinGW Makefiles .. +``` + +Results of the build will be placed under `build/bin/` folder: + +```tree + bin + |- dev_ethosu_eval-tests + |_ ethos-u +``` + +### Configuring the build for `simple_platform` + +```commandline +cmake \ + -DTARGET_PLATFORM=simple_platform \ + -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake .. +``` + +For Windows add `-G "MinGW Makefiles"`: + +```commandline +cmake \ + -DTARGET_PLATFORM=simple_platform \ + -DCMAKE_TOOLCHAIN_FILE=public/scripts/cmake/bare-metal-toolchain.cmake \ + -G "MinGW Makefiles" .. +``` + +### Building the configured project + +If the CMake command succeeds, build the application as follows: + +```commandline +make -j4 +``` + +or for Windows: + +```commandline +mingw32-make -j4 +``` + +Add `VERBOSE=1` to see compilation and link details. + +Results of the build will be placed under `build/bin` folder, an +example: + +```tree +bin + ├── ethos-u-<use_case_name>.axf + ├── ethos-u-<use_case_name>.htm + ├── ethos-u-<use_case_name>.map + ├── images-<use_case_name>.txt + └── sectors + └── <use_case> + ├── dram.bin + └── itcm.bin +``` + +Where for each implemented use-case under the `source/use-case` directory, +the following build artefacts will be created: + +- `ethos-u-<use case name>.axf`: The built application binary for a ML + use case. + +- `ethos-u-<use case name>.map`: Information from building the + application (e.g. libraries used, what was optimized, location of + objects). + +- `ethos-u-<use case name>.htm`: Human readable file containing the + call graph of application functions. + +- `sectors/`: Folder containing the built application, split into files + for loading into different FPGA memory regions. + +- `images-<use case name>.txt`: Tells the FPGA which memory regions to + use for loading the binaries in sectors/** folder. + +> **Note:** For the specific use case commands see the relative section +in the use case documentation. + +## Building timing adapter with custom options + +The sources also contains the configuration for a timing adapter utility +for the Ethos-U55 driver. The timing adapter allows the platform to simulate user +provided memory bandwidth and latency constraints. + +The timing adapter driver aims to control the behavior of two AXI buses +used by Ethos-U55. One is for SRAM memory region and the other is for +flash or DRAM. The SRAM is where intermediate buffers are expected to be +allocated and therefore, this region can serve frequent R/W traffic +generated by computation operations while executing a neural network +inference. The flash or DDR is where we expect to store the model +weights and therefore, this bus would typically be used only for R/O +traffic. + +It is used for MPS3 FPGA as well as for Fast Model environment. + +The CMake build framework allows the parameters to control the behavior +of each bus with following parameters: + +- `MAXR`: Maximum number of pending read operations allowed. 0 is + inferred as infinite, and the default value is 4. + +- `MAXW`: Maximum number of pending write operations allowed. 0 is + inferred as infinite, and the default value is 4. + +- `MAXRW`: Maximum number of pending read+write operations allowed. 0 is + inferred as infinite, and the default value is 8. + +- `RLATENCY`: Minimum latency, in cycle counts, for a read operation. + This is the duration between ARVALID and RVALID signals. The default + value is 50. + +- `WLATENCY`: Minimum latency, in cycle counts, for a write operation. + This is the duration between WVALID + WLAST and BVALID being + de-asserted. The default value is 50. + +- `PULSE_ON`: Number of cycles during which addresses are let through. + The default value is 5100. + +- `PULSE_OFF`: Number of cycles during which addresses are blocked. The + default value is 5100. + +- `BWCAP`: Maximum number of 64-bit words transferred per pulse cycle. A + pulse cycle is PULSE_ON + PULSE_OFF. 0 is inferred as infinite, and + the default value is 625. + +- `MODE`: Timing adapter operation mode. Default value is 0 + + - Bit 0: 0=simple; 1=latency-deadline QoS throttling of read vs. + write + + - Bit 1: 1=enable random AR reordering (0=default), + + - Bit 2: 1=enable random R reordering (0=default), + + - Bit 3: 1=enable random B reordering (0=default) + +For timing adapter's CMake build configuration, the SRAM AXI is assigned +index 0 and the flash/DRAM AXI bus has index 1. To change the bus +parameter for the build a "***TA_\<index>_**"* prefix should be added +to the above. For example, **TA0_MAXR=10** will set the SRAM AXI bus's +maximum pending reads to 10. + +As an example, if we have the following parameters for flash/DRAM +region: + +- `TA1_MAXR` = "2" + +- `TA1_MAXW` = "0" + +- `TA1_MAXRW` = "0" + +- `TA1_RLATENCY` = "64" + +- `TA1_WLATENCY` = "32" + +- `TA1_PULSE_ON` = "320" + +- `TA1_PULSE_OFF` = "80" + +- `TA1_BWCAP` = "50" + +For a clock rate of 500MHz, this would translate to: + +- The maximum duty cycle for any operation is:\ +![Maximum duty cycle formula](../media/F1.png) + +- Maximum bit rate for this bus (64-bit wide) is:\ +![Maximum bit rate formula](../media/F2.png) + +- With a read latency of 64 cycles, and maximum pending reads as 2, + each read could be a maximum of 64 or 128 bytes, as defined for + Ethos-U55\'s AXI bus\'s attribute. + + The bandwidth is calculated solely by read parameters ![Bandwidth formula]( + ../media/F3.png) + + This is higher than the overall bandwidth dictated by the bus parameters + of \ + ![Overall bandwidth formula](../media/F4.png) + +This suggests that the read operation is limited only by the overall bus +bandwidth. + +Timing adapter requires recompilation to change parameters. Default timing +adapter configuration file pointed to by `TA_CONFIG_FILE` build parameter is +located in the scripts/cmake folder and contains all options for AXI0 and +AXI1 described above. + +An example of scripts/cmake/ta_config.cmake: + +```cmake +# Timing adapter options +set(TA_INTERACTIVE OFF) + +# Timing adapter settings for AXI0 +set(TA0_MAXR "8") +set(TA0_MAXW "8") +set(TA0_MAXRW "0") +set(TA0_RLATENCY "32") +set(TA0_WLATENCY "32") +set(TA0_PULSE_ON "3999") +set(TA0_PULSE_OFF "1") +set(TA0_BWCAP "4000") +... +``` + +An example of the build with custom timing adapter configuration: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake \ + -DTA_CONFIG_FILE=scripts/cmake/my_ta_config.cmake .. +``` + +## Add custom inputs + +The application performs inference on input data found in the folder set +by the CMake parameters, for more information see the 3.3 section in the +specific use case documentation. + +## Add custom model + +The application performs inference using the model pointed to by the +CMake parameter `MODEL_TFLITE_PATH`. + +> **Note:** If you want to run the model using Ethos-U55, ensure your custom +model has been run through the Vela compiler successfully before continuing. + +To run the application with a custom model you will need to provide a +labels_<model_name>.txt file of labels associated with the model. +Each line of the file should correspond to one of the outputs in your +model. See the provided labels_mobilenet_v2_1.0_224.txt file in the +img_class use case for an example. + +Then, you must set `<use_case>_MODEL_TFLITE_PATH` to the location of +the Vela processed model file and `<use_case>_LABELS_TXT_FILE` to the +location of the associated labels file: + +```commandline +cmake \ + -D<use_case>_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \ + -D<use_case>_LABELS_TXT_FILE=<path/to/labels_custom_model.txt> \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake .. +``` + +> **Note:** For the specific use case command see the relative section in the use case documentation. + +For Windows, add `-G MinGW Makefiles` to the CMake command. + +> **Note:** Clean the build directory before re-running the CMake command. + +The TensorFlow Lite for Microcontrollers model pointed to by `<use_case>_MODEL_TFLITE_PATH` and +labels text file pointed to by `<use_case>_LABELS_TXT_FILE` will be +converted to C++ files during the CMake configuration stage and then +compiled into the application for performing inference with. + +The log from the configuration stage should tell you what model path and +labels file have been used: + +```log +-- User option TARGET_PLATFORM is set to mps3 +-- User option <use_case>_MODEL_TFLITE_PATH is set to +<path/to/custom_model_after_vela.tflite> +... +-- User option <use_case>_LABELS_TXT_FILE is set to +<path/to/labels_custom_model.txt> +... +-- Using <path/to/custom_model_after_vela.tflite> +++ Converting custom_model_after_vela.tflite to custom_model_after_vela.tflite.cc +-- Generating labels file from <path/to/labels_custom_model.txt> +-- writing to <path/to/build>/generated/include/Labels.hpp and <path/to/build>/generated/src/Labels.cc +... +``` + +After compiling, your custom model will have now replaced the default +one in the application. + +## Optimize custom model with Vela compiler + +> **Note:** This tool is not available within this project. +It is a python tool available from <https://pypi.org/project/ethos-u-vela/>. +The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>. + +The Vela compiler is a tool that can optimize a neural network model +into a version that can run on an embedded system containing Ethos-U55. + +The optimized model will contain custom operators for sub-graphs of the +model that can be accelerated by Ethos-U55, the remaining layers that +cannot be accelerated are left unchanged and will run on the CPU using +optimized (CMSIS-NN) or reference kernels provided by the inference +engine. + +After the compilation, the optimized model can only be executed on a +system with Ethos-U55. + +> **Note:** The NN model provided during the build and compiled into the application +executable binary defines whether CPU or NPU is used to execute workloads. +If unoptimized model is used, then inference will run on Cortex-M CPU. + +Vela compiler accepts parameters to influence a model optimization. The +model provided within this project has been optimized with +the following parameters: + +```commandline +vela \ + --accelerator-config=ethos-u55-128 \ + --block-config-limit=0 \ + --config my_vela_cfg.ini \ + --memory-mode Shared_Sram \ + --system-config Ethos_U55_High_End_Embedded \ + <model>.tflite +``` + +Where: + +- `--accelerator-config`: Specify the accelerator configuration to use + between ethos-u55-256, ethos-u55-128, ethos-u55-64 and ethos-u55-32. +- `--block-config-limit`: Limit block config search space, use zero for + unlimited. +- `--config`: Specifies the path to the Vela configuration file. The format of the file is a Python ConfigParser .ini file. + An example can be found in the `dependencies` folder [vela.ini](../../scripts/vela/vela.ini). +- `--memory-mode`: Selects the memory mode to use as specified in the Vela configuration file. +- `--system-config`:Selects the system configuration to use as specified in the Vela configuration file. + +Vela compiler accepts `.tflite` file as input and saves optimized network +model as a `.tflite` file. + +Using `--show-cpu-operations` and `--show-subgraph-io-summary` will show +all the operations that fall back to the CPU and a summary of all the +subgraphs and their inputs and outputs. + +To see Vela helper for all the parameters use: `vela --help`. + +Please, get in touch with your Arm representative to request access to +Vela Compiler documentation for more details. + +> **Note:** By default, use of the Ethos-U55 is enabled in the CMake configuration. +This could be changed by passing `-DETHOS_U55_ENABLED`. + +## Memory constraints + +Both the MPS3 Fixed Virtual Platform and the MPS3 FPGA platform share +the linker script (scatter file) for SSE-300 design. The design is set +by the CMake configuration parameter `TARGET_SUBSYSTEM` as described in +the previuous section. + +The memory map exposed by this design is presented in Appendix 1. This +can be used as a reference when editing the scatter file, especially to +make sure that region boundaries are respected. The snippet from MPS3's +scatter file is presented below: + +``` +;--------------------------------------------------------- +; First load region +;--------------------------------------------------------- +LOAD_REGION_0 0x00000000 0x00080000 +{ + ;----------------------------------------------------- + ; First part of code mem -- 512kiB + ;----------------------------------------------------- + itcm.bin 0x00000000 0x00080000 + { + *.o (RESET, +First) + * (InRoot$$Sections) + .ANY (+RO) + } + + ;----------------------------------------------------- + ; 128kiB of 512kiB bank is used for any other RW or ZI + ; data. Note: this region is internal to the Cortex-M CPU + ;----------------------------------------------------- + dtcm.bin 0x20000000 0x00020000 + { + .ANY(+RW +ZI) + } + + ;----------------------------------------------------- + ; 128kiB of stack space within the DTCM region + ;----------------------------------------------------- + ARM_LIB_STACK 0x20020000 EMPTY ALIGN 8 0x00020000 + {} + + ;----------------------------------------------------- + ; 256kiB of heap space within the DTCM region + ;----------------------------------------------------- + + ARM_LIB_HEAP 0x20040000 EMPTY ALIGN 8 0x00040000 + {} + + ;----------------------------------------------------- + ; SSE-300's internal SRAM + ;----------------------------------------------------- + isram.bin 0x21000000 UNINIT ALIGN 16 0x00080000 + { + ; activation buffers a.k.a tensor arena + *.o (.bss.NoInit.activation_buf) + } +} + +;--------------------------------------------------------- +; Second load region +;--------------------------------------------------------- +LOAD_REGION_1 0x60000000 0x02000000 +{ + ;----------------------------------------------------- + ; 32 MiB of DRAM space for nn model and input vectors + ;----------------------------------------------------- + dram.bin 0x60000000 ALIGN 16 0x02000000 + { + ; nn model's baked in input matrices + *.o (ifm) + + ; nn model + *.o (nn_model) + + ; if the activation buffer (tensor arena) doesn't + ; fit in the SRAM region, we accommodate it here + *.o (activation_buf) + } +} +``` + +It is worth noting that in the bitfile implementation, only the BRAM, +internal SRAM and DDR memory regions are accessible to the Ethos-U55 +block. In the above snippet, the internal SRAM region memory can be seen +to be utilized by activation buffers with a limit of 512kiB. If used, +this region will be written to by the Ethos-U55 block frequently. A bigger +region of memory for storing the model is placed in the DDR region, +under LOAD_REGION_1. The two load regions are necessary as the MPS3's +motherboard configuration controller limits the load size at address +0x00000000 to 512kiB. This has implications on how the application **is +deployed** on MPS3 as explained under the section 3.8.3. + +## Automatic file generation + +As mentioned in the previous sections, some files such as neural network +models, network's inputs, and output labels are automatically converted +into C/C++ arrays during the CMake project configuration stage. +Additionally, some code is generated to allow access to these arrays. + +An example: + +```log +-- Building use-cases: img_class. +-- Found sources for use-case img_class +-- User option img_class_FILE_PATH is set to /tmp/samples +-- User option img_class_IMAGE_SIZE is set to 224 +-- User option img_class_LABELS_TXT_FILE is set to /tmp/labels/labels_model.txt +-- Generating image files from /tmp/samples +++ Converting cat.bmp to cat.cc +++ Converting dog.bmp to dog.cc +-- Skipping file /tmp/samples/files.md due to unsupported image format. +++ Converting kimono.bmp to kimono.cc +++ Converting tiger.bmp to tiger.cc +++ Generating /tmp/build/generated/img_class/include/InputFiles.hpp +-- Generating labels file from /tmp/labels/labels_model.txt +-- writing to /tmp/build/generated/img_class/include/Labels.hpp and /tmp/build/generated/img_class/src/Labels.cc +-- User option img_class_ACTIVATION_BUF_SZ is set to 0x00200000 +-- User option img_class_MODEL_TFLITE_PATH is set to /tmp/models/model.tflite +-- Using /tmp/models/model.tflite +++ Converting model.tflite to model.tflite.cc +... +``` + +In particular, the building options pointing to the input files `<use_case>_FILE_PATH`, +the model `<use_case>_MODEL_TFLITE_PATH` and labels text file `<use_case>_LABELS_TXT_FILE` +are used by python scripts in order to generate not only the converted array files, +but also some headers with utility functions. + +For example, the generated utility functions for image classification are: + +- `build/generated/include/InputFiles.hpp` + +```c++ +#ifndef GENERATED_IMAGES_H +#define GENERATED_IMAGES_H + +#include <cstdint> + +#define NUMBER_OF_FILES (2U) +#define IMAGE_DATA_SIZE (150528U) + +extern const uint8_t im0[IMAGE_DATA_SIZE]; +extern const uint8_t im1[IMAGE_DATA_SIZE]; + +const char* get_filename(const uint32_t idx); +const uint8_t* get_img_array(const uint32_t idx); + +#endif /* GENERATED_IMAGES_H */ +``` + +- `build/generated/src/InputFiles.cc` + +```c++ +#include "InputFiles.hpp" + +static const char *img_filenames[] = { + "img1.bmp", + "img2.bmp", +}; + +static const uint8_t *img_arrays[] = { + im0, + im1 +}; + +const char* get_filename(const uint32_t idx) +{ + if (idx < NUMBER_OF_FILES) { + return img_filenames[idx]; + } + return nullptr; +} + +const uint8_t* get_img_array(const uint32_t idx) +{ + if (idx < NUMBER_OF_FILES) { + return img_arrays[idx]; + } + return nullptr; +} +``` + +These headers are generated using python templates, that are in `scripts/py/templates/*.template`. + +```tree +scripts/ +├── cmake +│ ├── ... +│ ├── subsystem-profiles +│ │ ├── corstone-sse-200.cmake +│ │ └── corstone-sse-300.cmake +│ ├── templates +│ │ ├── mem_regions.h.template +│ │ ├── peripheral_irqs.h.template +│ │ └── peripheral_memmap.h.template +│ └── ... +└── py + ├── <generation scripts> + ├── requirements.txt + └── templates + ├── audio.cc.template + ├── AudioClips.cc.template + ├── AudioClips.hpp.template + ├── default.hpp.template + ├── header_template.txt + ├── image.cc.template + ├── Images.cc.template + ├── Images.hpp.template + ├── Labels.cc.template + ├── Labels.hpp.template + ├── testdata.cc.template + ├── TestData.cc.template + ├── TestData.hpp.template + └── tflite.cc.template +``` + +Based on the type of use case the correct conversion is called in the use case cmake file +(audio or image respectively for voice or vision use cases). +For example, the generations call for image classification (`source/use_case/img_class/usecase.cmake`): + +```c++ +# Generate input files +generate_images_code("${${use_case}_FILE_PATH}" + ${SRC_GEN_DIR} + ${INC_GEN_DIR} + "${${use_case}_IMAGE_SIZE}") + +# Generate labels file +set(${use_case}_LABELS_CPP_FILE Labels) +generate_labels_code( + INPUT "${${use_case}_LABELS_TXT_FILE}" + DESTINATION_SRC ${SRC_GEN_DIR} + DESTINATION_HDR ${INC_GEN_DIR} + OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}" +) + +... + +# Generate model file +generate_tflite_code( + MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH} + DESTINATION ${SRC_GEN_DIR} +) +``` + +> **Note:** When required, for models and labels conversion it's possible to add extra parameters such +> as extra code to put in `<model>.cc` file or namespaces. +> +> ```c++ +> set(${use_case}_LABELS_CPP_FILE Labels) +> generate_labels_code( +> INPUT "${${use_case}_LABELS_TXT_FILE}" +> DESTINATION_SRC ${SRC_GEN_DIR} +> DESTINATION_HDR ${INC_GEN_DIR} +> OUTPUT_FILENAME "${${use_case}_LABELS_CPP_FILE}" +> NAMESPACE "namespace1" "namespace2" +> ) +> +> ... +> +> set(EXTRA_MODEL_CODE +> "/* Model parameters for ${use_case} */" +> "extern const int g_myvariable2 = value1" +> "extern const int g_myvariable2 = value2" +> ) +> +> generate_tflite_code( +> MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH} +> DESTINATION ${SRC_GEN_DIR} +> EXPRESSIONS ${EXTRA_MODEL_CODE} +> NAMESPACE "namespace1" "namespace2" +> ) +> ``` + +In addition to input file conversions, the correct platform/system profile is selected +(in `scripts/cmake/subsystem-profiles/*.cmake`) based on `TARGET_SUBSYSTEM` build option +and the variables set are used to generate memory region sizes, base addresses and IRQ numbers, +respectively used to generate mem_region.h, peripheral_irqs.h and peripheral_memmap.h headers. +Templates from `scripts/cmake/templates/*.template` are used to generate the header files. + +After the build, the files generated in the build folder are: + +```tree +build/generated/ +├── bsp +│ ├── mem_regions.h +│ ├── peripheral_irqs.h +│ └── peripheral_memmap.h +├── <use_case_name1> +│ ├── include +│ │ ├── InputFiles.hpp +│ │ └── Labels.hpp +│ └── src +│ ├── <uc1_input_file1>.cc +│ ├── <uc1_input_file2>.cc +│ ├── InputFiles.cc +│ ├── Labels.cc +│ └── <uc1_model_name>.tflite.cc +└── <use_case_name2> + ├── include + │ ├── InputFiles.hpp + │ └── Labels.hpp + └── src + ├── <uc2_input_file1>.cc + ├── <uc2_input_file2>.cc + ├── InputFiles.cc + ├── Labels.cc + └── <uc2_model_name>.tflite.cc +``` + +Next section of the documentation: [Deployment](../documentation.md#Deployment). diff --git a/docs/sections/coding_guidelines.md b/docs/sections/coding_guidelines.md new file mode 100644 index 0000000..f1813d3 --- /dev/null +++ b/docs/sections/coding_guidelines.md @@ -0,0 +1,323 @@ +# Coding standards and guidelines + +## Contents + +- [Introduction](#introduction) +- [Language version](#language-version) +- [File naming](#file-naming) +- [File layout](#file-layout) +- [Block Management](#block-management) +- [Naming Conventions](#naming-conventions) + - [C++ language naming conventions](#c_language-naming-conventions) + - [C language naming conventions](#c-language-naming-conventions) +- [Layout and formatting conventions](#layout-and-formatting-conventions) +- [Language usage](#language-usage) + +## Introduction + +This document presents some standard coding guidelines to be followed for contributions to this repository. Most of the +code is written in C++, but there is some written in C as well. There is a clear C/C++ boundary at the Hardware +Abstraction Layer (HAL). Both these languages follow different naming conventions within this repository, by design, to: + +- have clearly distinguishable C and C++ sources. +- make cross language function calls stand out. Mostly these will be C++ function calls to the HAL functions written in C. +However, because we also issue function calls to third party API's (and they may not follow these conventions), the +intended outcome may not be fully realised in all of the cases. + +## Language version + +For this project, code written in C++ shall use a subset of the C++11 feature set and software +may be written using the C++11 language standard. Code written in C should be compatible +with the C99 standard. + +Software components written in C/C++ may use the language features allowed and encouraged by this documentation. + +## File naming + +- C files should have `.c` extension +- C++ files should have `.cc` or `.cpp` extension. +- Header files for functions implemented in C should have `.h` extension. +- Header files for functions implemented in C++ should have `.hpp` extension. + +## File layout + +- Standard copyright notice must be included in all files: + + ```copyright + /* + * Copyright (c) <years additions were made to project> <your name>, Arm Limited. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + ``` + +- Source lines must be no longer than 120 characters. Prefer to spread code out vertically rather than horizontally, + wherever it makes sense: + + ```C++ + # This is significantly easier to read + enum class SomeEnum1 + { + ENUM_VALUE_1, + ENUM_VALUE_2, + ENUM_VALUE_3 + }; + + # than this + enum class SomeEnum2 { ENUM_VALUE_1, ENUM_VALUE_2, ENUM_VALUE_3 }; + ``` + +- Block indentation should use 4 characters, no tabs. + +- Each statement must be on a separate line. + + ```C++ + int a, b; // Error prone + int c, *d; + + int e = 0; // GOOD + int *p = nullptr; // GOOD + ``` + +- Source must not contain commented out code or unreachable code + +## Block Management + +- Blocks must use braces and braces location must be consistent. + - Each function has its opening brace at the next line on the same indentation level as its header, the code within + the braces is indented and the closing brace at the end is on the same level as the opening. + For compactness, if the class/function body is empty braces are accepted on the same line. + + - Conditional statements and loops, even if are just single-statement body, needs to be surrounded by braces, the +opening brace is at the same line, the closing brace is at the next line on the same indentation level as its header; +the same rule is applied to classes. + + ```C++ + class Class1 { + public: + Class1(); + private: + int element; + }; + + void NotEmptyFunction() + { + if (condition) { + // [...] + } else { + // [...] + } + // [...] + for(start_cond; end_cond; step_cond) { + // [...] + } + } + + void EmptyFunction() {} + ``` + + - Cases within switch are indented and enclosed in brackets: + + ```C++ + switch (option) + { + case 1: + { + // handle option 1 + break; + } + case 2: + { + // handle option 2 + break; + } + default: + { + break; + } + } + ``` + +## Naming Conventions + +### C++ language naming conventions + +- Type (class, struct, enum) names must be `PascalCase`: + + ```C++ + class SomeClass + { + // [...] + }; + void SomeFunction() + { + // [...] + } + ``` + +- Variables and parameter names must be `camelCase`: + + ```C++ + int someVariable; + + void SomeFunction(int someParameter) {} + ``` + +- Macros, pre-processor definitions, and enumeration values should use upper case names: + + ```C++ + #define SOME_DEFINE + + enum class SomeEnum + { + ENUM_VALUE_1, + ENUM_VALUE_2 + }; + ``` + +- Namespace names must be lower case + + ```C++ + namespace nspace + { + void FunctionInNamespace(); + }; + ``` + +- Source code should use Hungarian notation to annotate the name of a variable with information about its meaning. + + | Prefix | Class | Description | + | ------ | ----- | ----------- | + | p | Type | Pointer to any other type | + | k | Qualifier | Constant | + | v | Qualifier | Volatile | + | m | Scope | Member of a class or struct | + | s | Scope | Static | + | g | Scope | Used to indicate variable has scope beyond the current function: file-scope or externally visible scope| + +The following examples of Hungarian notation are one possible set of uses: + + ```C++ + int g_GlobalInt=123; + char* m_pNameOfMemberPointer=nullptr; + const float g_kSomeGlobalConstant = 1.234f; + static float ms_MyStaticMember = 4.321f; + bool myLocalVariable=true; + ``` + +### C language naming conventions + +For C sources, we follow the Linux variant of the K&R style wherever possible. + +- For function and variable names we use `snake_case` convention: + + ```C + int some_variable; + + void some_function(int some_parameter) {} + ``` + +- Macros, pre-processor definitions, and enumeration values should use upper case names: + + ```C + #define SOME_DEFINE + + enum some_enum + { + ENUM_VALUE_1, + ENUM_VALUE_2 + }; + ``` + +## Layout and formatting conventions + +- C++ class code layout + Public function definitions should be at the top of a class definition, since they are things most likely to be used +by other people. + Private functions and member variables should be last. + Class functions and member variables should be laid out logically in blocks of related functionality. + +- Class inheritance keywords are not indented. + + ```C++ + class MyClass + { + public: + int m_PublicMember; + protected: + int m_ProtectedMember; + private: + int m_PrivateMember; + }; + ``` + +- Don't leave trailing spaces at the end of lines. + +- Empty lines should have no trailing spaces. + +- For pointers and references, the symbols `*` and `&` should be adjacent to the name of the type, not the name + of the variable. + + ```C++ + char* someText = "abc"; + + void SomeFunction(const SomeObject& someObject) {} + ``` + +## Language usage + +- Header `#include` statements should be minimized. + Inclusion of unnecessary headers slows down compilation, and can hide errors where a function calls a + subroutine which it should not be using if the unnecessary header defining this subroutine is included. + + Header statements should be included in the following order: + + - Header file corresponding to the current source file (if applicable) + - Headers from the same component + - Headers from other components + - Third-party headers + - System headers + + > **Note:** Leave one blank line between each of these groups for readability. + >Use quotes for headers from within the same project and angle brackets for third-party and system headers. + >Do not use paths relative to the current source file, such as `../Header.hpp`. Instead configure your include paths +>in the project makefiles. + + ```C++ + #include "ExampleClass.hpp" // Own header + + #include "Header1.hpp" // Header from same component + #include "Header1.hpp" // Header from same component + + #include "other/Header3.hpp" // Header from other component + + #include <ThirdParty.hpp> // Third-party headers + + #include <vector> // System header + + // [...] + ``` + +- C++ casts should use the template-styled case syntax + + ```C++ + int a = 100; + float b = (float)a; // Not OK + float c = static_cast<float>(a); // OK + ``` + +- Use the const keyword to declare constants instead of define. + +- Should use `nullptr` instead of `NULL`, + C++11 introduced the `nullptr` type to distinguish null pointer constants from the integer 0. diff --git a/docs/sections/customizing.md b/docs/sections/customizing.md new file mode 100644 index 0000000..e92c327 --- /dev/null +++ b/docs/sections/customizing.md @@ -0,0 +1,731 @@ +# Implementing custom ML application + +- [Software project description](#software-project-description) +- [HAL API](#hal-api) +- [Main loop function](#main-loop-function) +- [Application context](#application-context) +- [Profiler](#profiler) +- [NN Model API](#nn-model-api) +- [Adding custom ML use case](#adding-custom-ml-use-case) +- [Implementing main loop](#implementing-main-loop) +- [Implementing custom NN model](#implementing-custom-nn-model) +- [Executing inference](#executing-inference) +- [Printing to console](#printing-to-console) +- [Reading user input from console](#reading-user-input-from-console) +- [Output to MPS3 LCD](#output-to-mps3-lcd) +- [Building custom use case](#building-custom-use-case) + +This section describes how to implement a custom Machine Learning +application running on Fast Model FVP or on the Arm MPS3 FPGA prototyping board. + +Arm® Ethos™-U55 code sample software project offers a simple way to incorporate +additional use-case code into the existing infrastructure and provides a build +system that automatically picks up added functionality and produces corresponding +executable for each use-case. This is achieved by following certain configuration +and code implementation conventions. + +The following sign will indicate the important conventions to apply: + +> **Convention:** The code is developed using C++11 and C99 standards. +This is governed by TensorFlow Lite for Microcontrollers framework. + +## Software project description + +As mentioned in the [Repository structure](../documentation.md#repository-structure) section, project sources are: + +```tree +├── docs +│ ├── ... +│ └── Documentation.md +├── resources +│ └── img_class +│ └── ... +├── scripts +│ └── ... +├── source +│ ├── application +│ │ ├── hal +│ │ ├── main +│ │ └── tensorflow-lite-micro +│ └── use_case +│ └──img_class +├── CMakeLists.txt +└── Readme.md +``` + +Where `source` contains C/C++ sources for the platform and ML applications. +Common code related to the Ethos-U55 code samples software +framework resides in the *application* sub-folder and ML application specific logic (use-cases) +sources are in the *use-case* subfolder. + +> **Convention**: Separate use-cases must be organized in sub-folders under the use-case folder. +The name of the directory is used as a name for this use-case and could be provided +as a `USE_CASE_BUILD` parameter value. +It is expected by the build system that sources for the use-case are structured as follows: +headers in an include directory, C/C++ sources in a src directory. +For example: +> +>```tree +>use_case +> └──img_class +> ├── include +> │ └── *.hpp +> └── src +> └── *.cc +>``` + +## HAL API + +Hardware abstraction layer is represented by the following interfaces. +To access them, include hal.h header. + +- *hal_platfrom* structure:\ + Structure that defines a platform context to be used by the application + + | Attribute name | Description | + |--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------| + | inited | Initialization flag. Is set after the platfrom_init() function is called. | + | plat_name | Platform name. it is set to "mps3-bare" for MPS3 build and "FVP" for Fast Model build. | + | data_acq | Pointer to data acquisition module responsible for user interaction and other data collection for the application logic. | + | data_psn | Pointer to data presentation module responsible for data output through components available in the selected platform: LCD -- for MPS3, console -- for Fast Model. | + | timer | Pointer to platform timer implementation (see platform_timer) | + | platform_init | Pointer to platform initialization function. | + | platform_release | Pointer to platform release function | + +- *hal_init* function:\ + Initializes the HAL structure based on compile time config. This + should be called before any other function in this API. + + | Parameter name | Description| + |------------------|-----------------------------------------------------| + | platform | Pointer to a pre-allocated *hal_platfrom* struct. | + | data_acq | Pointer to a pre-allocated data acquisition module | + | data_psn | Pointer to a pre-allocated data presentation module | + | timer | Pointer to a pre-allocated timer module | + | return | zero if successful, error code otherwise | + +- *hal_platform_init* function:\ + Initializes the HAL platform and all the modules on the platform the + application requires to run. + + | Parameter name | Description | + | ----------------| ------------------------------------------------------------------- | + | platform | Pointer to a pre-allocated and initialized *hal_platfrom* struct. | + | return | zero if successful, error code otherwise. | + +- *hal_platform_release* function\ + Releases the HAL platform. This should release resources acquired. + + | Parameter name | Description | + | ----------------| ------------------------------------------------------------------- | + | platform | Pointer to a pre-allocated and initialized *hal_platfrom* struct. | + +- *data_acq_module* structure:\ + Structure to encompass the data acquisition module and it's + methods. + + | Attribute name | Description | + |----------------|----------------------------------------------------| + | inited | Initialization flag. Is set after the system_init () function is called. | + | system_name | Channel name. It is set to "UART" for MPS3 build and fastmodel builds. | + | system_init | Pointer to data acquisition module initialization function. The pointer is set according to the platform selected during the build. This function is called by the platforminitialization routines. | + | get_input | Pointer to a function reading user input. The pointer is set according to the selected platform during the build. For MPS3 and fastmodel environments, the function reads data from UART. | + +- *data_psn_module* structure:\ + Structure to encompass the data presentation module and its methods. + + | Attribute name | Description | + |--------------------|------------------------------------------------| + | inited | Initialization flag. It is set after the system_init () function is called. | + | system_name | System component name used to present data. It is set to "lcd" for MPS3 build and to "log_psn" for fastmodel build. In case of fastmodel, all pixel drawing functions are replaced by console output of the data summary. | + | system_init | Pointer to data presentation module initialization function. The pointer is set according to the platform selected during the build. This function is called by the platform initialization routines. | + | present_data_image | Pointer to a function to draw an image. The pointer is set according to the selected platform during the build. For MPS3, the image will be drawn on the LCD; for fastmodel image summary will be printed in the UART (coordinates, channel info, downsample factor) | + | present_data_text | Pointer to a function to print a text. The pointer is set according to the selected platform during the build. For MPS3, the text will be drawn on the LCD; for fastmodel text will be printed in the UART. | + | present_box | Pointer to a function to draw a rectangle. The pointer is set according to the selected platform during the build. For MPS3, the image will be drawn on the LCD; for fastmodel image summary will be printed in the UART. | + | clear | Pointer to a function to clear the output. The pointer is set according to the selected platform during the build. For MPS3, the function will clear the LCD; for fastmodel will do nothing. | + | set_text_color | Pointer to a function to set text color for the next call of present_data_text() function. The pointer is set according to the selected platform during the build. For MPS3, the function will set the color for the text printed on the LCD; for fastmodel -- will do nothing. | + | set_led | Pointer to a function controlling an LED (led_num) with on/off | + +- *platform_timer* structure:\ + Structure to hold a platform specific timer implementation. + + | Attribute name | Description | + |--------------------|------------------------------------------------| + | inited | Initialization flag. It is set after the timer is initialized by the *hal_platform_init* function. | + | reset | Pointer to a function to reset a timer. | + | get_time_counter | Pointer to a function to get current time counter. | + | get_duration_ms | Pointer to a function to calculate duration between two time-counters in milliseconds. | + | get_duration_us | Pointer to a function to calculate duration between two time-counters in microseconds | + | get_npu_cycle_diff | Pointer to a function to calculate duration between two time-counters in Ethos-U55 cycles. Available only when project is configured with ETHOS_U55_ENABLED set. | + +Example of the API initialization in the main function: + +```c++ +#include "hal.h" + +int main () + +{ + + hal_platform platform; + data_acq_module dataAcq; + data_psn_module dataPsn; + platform_timer timer; + + /* Initialise the HAL and platform */ + hal_init(&platform, &dataAcq, &dataPsn, &timer); + hal_platform_init(&platform); + + ... + + hal_platform_release(&platform); + + return 0; + +} +``` + +## Main loop function + +Code samples application main function will delegate the use-case +logic execution to the main loop function that must be implemented for +each custom ML scenario. + +Main loop function takes the initialized *hal_platform* structure +pointer as an argument. + +The main loop function has external linkage and main executable for the +use-case will have reference to the function defined in the use-case +code. + +```c++ +void main_loop(hal_platform& platform){ + +... + +} +``` + +## Application context + +Application context could be used as a holder for a state between main +loop iterations. Include AppContext.hpp to use ApplicationContext class. + +| Method name | Description | +|--------------|-----------------------------------------------------------------| +| Set | Saves given value as a named attribute in the context. | +| Get | Gets the saved attribute from the context by the given name. | +| Has | Checks if an attribute with a given name exists in the context. | + +For example: + +```c++ +#include "hal.h" +#include "AppContext.hpp" + +void main_loop(hal_platform& platform) { + + /* Instantiate application context */ + arm::app::ApplicationContext caseContext; + caseContext.Set<hal_platform&>("platform", platform); + caseContext.Set<uint32_t>("counter", 0); + + /* loop */ + while (true) { + // do something, pass application context down the call stack + } +} +``` + +## Profiler + +Profiler is a helper class assisting in collection of timings and +Ethos-U55 cycle counts for operations. It uses platform timer to get +system timing information. + +| Method name | Description | +|----------------------|-----------------------------------------------------------| +| StartProfiling | Starts profiling and records the starting timing data. | +| StopProfiling | Stops profiling and records the ending timing data. | +| Reset | Resets the profiler and clears all collected data. | +| GetResultsAndReset | Gets the results as string and resets the profiler. | + +Usage example: + +```c++ +Profiler profiler{&platform, "Inference"}; + +profiler.StartProfiling(); +// Code running inference to profile +profiler.StopProfiling(); + +info("%s\n", profiler.GetResultsAndReset().c_str()); +``` + +## NN Model API + +Model (refers to neural network model) is an abstract class wrapping the +underlying TensorFlow Lite Micro API and providing methods to perform +common operations such as TensorFlow Lite Micro framework +initialization, inference execution, accessing input and output tensor +objects. + +To use this abstraction, import TensorFlowLiteMicro.hpp header. + +| Method name | Description | +|--------------------------|------------------------------------------------------------------------------| +| GetInputTensor | Returns the pointer to the model\'s input tensor. | +| GetOutputTensor | Returns the pointer to the model\'s output tensor | +| GetType | Returns the model's data type | +| GetInputShape | Return the pointer to the model\'s input shape | +| GetOutputShape | Return the pointer to the model\'s output shape | +| LogTensorInfo | Logs the tensor information to stdout for the given tensor pointer: tensor name, tensor address, tensor type, tensor memory size and quantization params. | +| LogInterpreterInfo | Logs the interpreter information to stdout. | +| Init | Initializes the TensorFlow Lite Micro framework, allocates require memory for the model. | +| IsInited | Checks if this model object has been initialized. | +| IsDataSigned | Checks if the model uses signed data type. | +| RunInference | Runs the inference (invokes the interpreter). | +| GetOpResolver() | Returns the reference to the TensorFlow Lite Micro operator resolver. | +| EnlistOperations | Registers required operators with TensorFlow Lite Micro operator resolver. | +| GetTensorArena | Returns pointer to memory region to be used for tensors allocations. | +| GetActivationBufferSize | Returns the size of the tensor arena memory region. | + +> **Convention**: Each ML use-case must have extension of this class and implementation of the protected virtual methods: +> +>```c++ +>virtual const tflite::MicroOpResolver& GetOpResolver() = 0; +>virtual bool EnlistOperations() = 0; +>virtual uint8_t* GetTensorArena() = 0; +>virtual size_t GetActivationBufferSize() = 0; +>``` +> +>Network models have different set of operators that must be registered with +tflite::MicroMutableOpResolver object in the EnlistOperations method. +Network models could require different size of activation buffer that is returned as +tensor arena memory for TensorFlow Lite Micro framework by the GetTensorArena +and GetActivationBufferSize methods. + +Please see MobileNetModel.hpp and MobileNetModel.cc files from image +classification ML application use-case as an example of the model base +class extension. + +## Adding custom ML use case + +This section describes how to implement additional use-case and compile +it into the binary executable to run with Fast Model or MPS3 FPGA board. +It covers common major steps: application main loop creation, +description of the NN model, inference execution. + +In addition, few useful examples are provided: reading user input, +printing into console, drawing images into MPS3 LCD. + +```tree +use_case + └──hello_world + ├── include + └── src +``` + +Start with creation of a sub-directory under the *use_case* directory and +two other directories *src* and *include* as described in +[Software project description](#software-project-description) section: + +## Implementing main loop + +Use-case main loop is the place to put use-case main logic. Essentially, +it is an infinite loop that reacts on user input, triggers use-case +conditional logic based on the input and present results back to the +user. However, it could also be a simple logic that runs a single inference +and then exits. + +Main loop has knowledge about the platform and has access to the +platform components through the hardware abstraction layer (referred to as HAL). + +Create a *MainLoop.cc* file in the *src* directory (the one created under +[Adding custom ML use case](#adding-custom-ml-use-case)), the name is not +important. Define *main_loop* function with the signature described in +[Main loop function](#main-loop-function): + +```c++ +#include "hal.h" + +void main_loop(hal_platform& platform) { + printf("Hello world!"); +} +``` + +The above is already a working use-case, if you compile and run it (see +[Building custom usecase](#Building-custom-use-case)) the application will start, print +message to console and exit straight away. + +Now, you can start filling this function with logic. + +## Implementing custom NN model + +Before inference could be run with a custom NN model, TensorFlow Lite +Micro framework must learn about the operators/layers included in the +model. Developer must register operators using *MicroMutableOpResolver* +API. + +Ethos-U55 code samples project has an abstraction around TensorFlow +Lite Micro API (see [NN model API](#nn-model-api)). Create *HelloWorld.hpp* in +the use-case include sub-directory, extend Model abstract class and +declare required methods. + +For example: + +```c++ +#include "Model.hpp" + +namespace arm { +namespace app { + +class HelloWorldModel: public Model { + protected: + /** @brief Gets the reference to op resolver interface class. */ + const tflite::MicroOpResolver& GetOpResolver() override; + + /** @brief Adds operations to the op resolver instance. */ + bool EnlistOperations() override; + + const uint8_t* ModelPointer() override; + + size_t ModelSize() override; + + private: + /* Maximum number of individual operations that can be enlisted. */ + static constexpr int _m_maxOpCnt = 5; + + /* A mutable op resolver instance. */ + tflite::MicroMutableOpResolver<_maxOpCnt> _m_opResolver; + }; +} /* namespace app */ +} /* namespace arm */ +``` + +Create `HelloWorld.cc` file in the `src` sub-directory and define the methods +there. Include `HelloWorldModel.hpp` created earlier. Note that `Model.hpp` +included in the header provides access to TensorFlow Lite Micro's operation +resolver API. + +Please, see `use_case/image_classifiaction/src/MobileNetModel.cc` for +code examples.\ +If you are using a TensorFlow Lite model compiled with Vela, it is important to add +custom Ethos-U55 operator to the operators list. + +The following example shows how to add the custom Ethos-U55 operator with +TensorFlow Lite Micro framework. We will use the ARM_NPU define to exclude +the code if the application was built without NPU support. + +```c++ +#include "HelloWorldModel.hpp" + +bool arm::app::HelloWorldModel::EnlistOperations() { + + #if defined(ARM_NPU) + if (kTfLiteOk == this->_opResolver.AddEthosU()) { + info("Added %s support to op resolver\n", + tflite::GetString_ETHOSU()); + } else { + printf_err("Failed to add Arm NPU support to op resolver."); + return false; + } + #endif /* ARM_NPU */ + + return true; +} +``` + +To minimize application memory footprint, it is advised to register only +operators used by the NN model. + +Define `ModelPointer` and `ModelSize` methods. These functions are wrappers around the +functions generated in the C++ file containing the neural network model as an array. +This generation the C++ array from the .tflite file, logic needs to be defined in +the `usecase.cmake` file for this `HelloWorld` example. + +For more details on `usecase.cmake`, see [Building custom use case](#building-custom-use-case). +For details on code generation flow in general, see [Automatic file generation](./building.md#Automatic-file-generation) + +The TensorFlow Lite model data is read during Model::init() method execution, see +*application/tensorflow-lite-micro/Model.cc* for more details. Model invokes +`ModelPointer()` function which calls the `GetModelPointer()` function to get +neural network model data memory address. The `GetModelPointer()` function +will be generated during the build and could be found in the +file `build/generated/hello_world/src/<model_file_name>.cc`. Generated +file is added to the compilation automatically. + +Use \${use-case}_MODEL_TFLITE_PATH build parameter to include custom +model to the generation/compilation process (see [Build options](./building.md/#build-options)). + +## Executing inference + +To run an inference successfully it is required to have: + +- a TensorFlow Lite model file +- extended Model class +- place to add the code to invoke inference +- main loop function +- and some input data. + +For the hello_world example below, the input array is not populated. +However, for real-world scenarios, this data should either be read from +an on-board device or be prepared in the form of C++ sources before +compilation and be baked into the application. + +For example, the image classification application has extra build steps +to generate C++ sources from the provided images with +*generate_images_code* CMake function. + +> **Note:** +Check the input data type for your NN model and input array data type are the same. +For example, generated C++ sources for images store image data as uint8 array. For models that were +quantized to int8 data type, it is important to convert image data to int8 correctly before inference execution. +Asymmetric data type to symmetric data type conversion involves positioning zero value, i.e. subtracting an +offset for uint8 values. Please check image classification application source for the code example +(ConvertImgToInt8 function). + +The following code adds inference invocation to the main loop function: + +```c++ +#include "hal.h" +#include "HelloWorldModel.hpp" + + void main_loop(hal_platform& platform) { + + /* model wrapper object */ + arm::app::HelloWorldModel model; + + /* Load the model */ + if (!model.Init()) { + printf_err("failed to initialise model\n"); + return; + } + + TfLiteTensor *outputTensor = model.GetOutputTensor(); + TfLiteTensor *inputTensor = model.GetInputTensor(); + + /* dummy input data*/ + uint8_t inputData[1000]; + + memcpy(inputTensor->data.data, inputData, 1000); + + /* run inference */ + model.RunInference(); + + const uint32_t tensorSz = outputTensor->bytes; + const uint8_t * outputData = tflite::GetTensorData<uint8>(outputTensor); +} +``` + +The code snippet has several important blocks: + +- Creating HelloWorldModel object and initializing it. + + ```c++ + arm::app::HelloWorldModel model; + + /* Load the model */ + if (!model.Init()) { + printf_err(\"failed to initialise model\\n\"); + return; + } + ``` + +- Getting pointers to allocated input and output tensors. + + ```c++ + TfLiteTensor *outputTensor = model.GetOutputTensor(); + TfLiteTensor *inputTensor = model.GetInputTensor(); + ``` + +- Copying input data to the input tensor. We assume input tensor size + to be 1000 uint8 elements. + + ```c++ + memcpy(inputTensor->data.data, inputData, 1000); + ``` + +- Running inference + + ```c++ + model.RunInference(); + ``` + +- Reading inference results: data and data size from the output + tensor. We assume that output layer has uint8 data type. + + ```c++ + Const uint32_t tensorSz = outputTensor->bytes ; + + const uint8_t *outputData = tflite::GetTensorData<uint8>(outputTensor); + ``` + +Adding profiling for Ethos-U55 is easy. Include `Profiler.hpp` header and +invoke `StartProfiling` and `StopProfiling` around inference +execution. + +```c++ +Profiler profiler{&platform, "Inference"}; + +profiler.StartProfiling(); +model.RunInference(); +profiler.StopProfiling(); +std::string profileResults = profiler.GetResultsAndReset(); + +info("%s\n", profileResults.c_str()); +``` + +## Printing to console + +Provided examples already used some function to print messages to the +console. The full list of available functions: + +- `printf` +- `trace` - printf wrapper for tracing messages +- `debug` - printf wrapper for debug messages +- `info` - printf wrapper for informational messages +- `warn` - printf wrapper for warning messages +- `printf_err` - printf wrapper for error messages + +`printf` wrappers could be switched off with `LOG_LEVEL` define: + +trace (0) < debug (1) < info (2) < warn (3) < error (4). + +Default output level is info = level 2. + +## Reading user input from console + +Platform data acquisition module has get_input function to read keyboard +input from the UART. It can be used as follows: + +```c++ +char ch_input[128]; +platform.data_acq->get_input(ch_input, sizeof(ch_input)); +``` + +The function will block until user provides an input. + +## Output to MPS3 LCD + +Platform presentation module has functions to print text or an image to +the board LCD: + +- `present_data_text` +- `present_data_image` + +Text presentation function has the following signature: + +- `const char* str`: string to print. +- `const uint32_t str_sz`: string size. +- `const uint32_t pos_x`: x coordinate of the first letter in pixels. +- `const uint32_t pos_y`: y coordinate of the first letter in pixels. +- `const uint32_t alow_multiple_lines`: signals whether the text is + allowed to span multiple lines on the screen, or should be truncated + to the current line. + +This function does not wrap text, if the given string cannot fit on the +screen it will go outside the screen boundary. + +Example that prints "Hello world" on the LCD: + +```c++ +std::string hello("Hello world"); +platform.data_psn->present_data_text(hello.c_str(), hello.size(), 10, 35, 0); +``` + +Image presentation function has the following signature: + +- `uint8_t* data`: image data pointer; +- `const uint32_t width`: image width; +- `const uint32_t height`: image height; +- `const uint32_t channels`: number of channels. Only 1 and 3 channels are supported now. +- `const uint32_t pos_x`: x coordinate of the first pixel. +- `const uint32_t pos_y`: y coordinate of the first pixel. +- `const uint32_t downsample_factor`: the factor by which the image is to be down sampled. + +For example, the following code snippet visualizes an input tensor data +for MobileNet v2 224 (down sampling it twice): + +```c++ +platform.data_psn->present_data_image((uint8_t *) inputTensor->data.data, 224, 224, 3, 10, 35, 2); +``` + +Please see [hal-api](#hal-api) section for other data presentation +functions. + +## Building custom use case + +There is one last thing to do before building and running a use-case +application: create a `usecase.cmake` file in the root of your use-case, +the name of the file is not important. + +> **Convention:** The build system searches for CMake file in each use-case directory and includes it into the build +> flow. This file could be used to specify additional application specific build options, add custom build steps or +> override standard compilation and linking flags. +> Use `USER_OPTION` function to add additional build option. Prefix variable name with `${use_case}` (use-case name) to +> avoid names collisions with other CMake variables. +> Some useful variable names visible in use-case CMake file: +> +> - `DEFAULT_MODEL_PATH` – default model path to use if use-case specific `${use_case}_MODEL_TFLITE_PATH` is not set +>in the build arguments. +>- `TARGET_NAME` – name of the executable. +> - `use_case` – name of the current use-case. +> - `UC_SRC` – list of use-case sources. +> - `UC_INCLUDE` – path to the use-case headers. +> - `ETHOS_U55_ENABLED` – flag indicating if the current build supports Ethos-U55. +> - `TARGET_PLATFORM` – Target platform being built for. +> - `TARGET_SUBSYSTEM` – If target platform supports multiple subsystems, this is the name of the subsystem. +> - All standard build options. +> - `CMAKE_CXX_FLAGS` and `CMAKE_C_FLAGS` – compilation flags. +> - `CMAKE_EXE_LINKER_FLAGS` – linker flags. + +For the hello world use-case it will be enough to create +`helloworld.cmake` file and set DEFAULT_MODEL_PATH: + +```cmake +if (ETHOS_U55_ENABLED EQUAL 1) + set(DEFAULT_MODEL_PATH ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8_vela.tflite) +else() + set(DEFAULT_MODEL_PATH ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8.tflite) +endif() +``` + +This can be used in subsequent section, for example: + +```cmake +USER_OPTION(${use_case}_MODEL_TFLITE_PATH "Neural network model in tflite format." + ${DEFAULT_MODEL_PATH} + FILEPATH + ) + +# Generate model file +generate_tflite_code( + MODEL_PATH ${${use_case}_MODEL_TFLITE_PATH} + DESTINATION ${SRC_GEN_DIR} + ) +``` + +This ensures that the model path pointed by `${use_case}_MODEL_TFLITE_PATH` is converted to a C++ array and is picked +up by the build system. More information on auto-generations is available under section +[Automatic file generation](./building.md#Automatic-file-generation). + +To build you application follow the general instructions from +[Add Custom inputs](#add-custom-inputs) and specify the name of the use-case in the +build command: + +```commandline +cmake \ + -DTARGET_PLATFORM=mps3 \ + -DTARGET_SUBSYSTEM=sse-300 \ + -DUSE_CASE_BUILD=hello_world \ + -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/bare-metal-toolchain.cmake .. +``` + +For Windows, add `-G "MinGW Makefiles"` to the CMake command. + +As a result, `ethos-u-hello_world.axf` should be created, MPS3 build +will also produce `sectors/hello_world` directory with binaries and +`images-hello_world.txt` to be copied to the board MicroSD card. + +Next section of the documentation: [Testing and benchmarking](../documentation.md#Testing-and-benchmarking). diff --git a/docs/sections/deployment.md b/docs/sections/deployment.md new file mode 100644 index 0000000..354d30b --- /dev/null +++ b/docs/sections/deployment.md @@ -0,0 +1,281 @@ +# Deployment + +- [Fixed Virtual Platform](#fixed-virtual-platform) + - [Setting up the MPS3 Arm Corstone-300 FVP](#setting-up-the-mps3-arm-corstone-300-fvp) + - [Deploying on an FVP emulating MPS3](#deploying-on-an-fvp-emulating-mps3) +- [MPS3 board](#mps3-board) + - [Deployment on MPS3 board](#deployment-on-mps3-board) + +The sample application for Arm® Ethos™-U55 can be deployed on two +target platforms, both of which implement the Arm® Corstone™-300 design (see +<https://www.arm.com/products/iot/soc/corstone-300>): + +- A physical Arm MPS3 FPGA prototyping board + +- An MPS3 FVP + +## Fixed Virtual Platform + +The FVP is available publicly from [Arm Ecosystem FVP downloads +](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps). +Download the correct archive from the list under `Arm Corstone-300`. We need the one which: + +- Emulates MPS3 board (not for MPS2 FPGA board) +- Contains support for Arm® Ethos™-U55 + +> **Note:** Currently, the FVP only has a Linux OS version. Also, there are no FVPs available for `SSE-200` +> which satisfy the above conditions. + +For FVP, the elf or the axf file can be run using the Fast Model +executable as outlined under the [Starting Fast Model simulation](./setup.md/#starting-fast-model-simulation) +except for the binary being pointed at here +is the one just built using the steps in the previous section. + +### Setting up the MPS3 Arm Corstone-300 FVP + +For Ethos-U55 sample application, please download the MPS3 version of the +Arm® Corstone™-300 model that contains Ethos-U55 and Arm® Cortex®-M55. The model is +currently only supported on Linux based machines. To install the FVP: + +- Unpack the archive + +- Run the install script in the extracted package + + `./FVP_Corstone_SSE-300_Ethos-U55.sh` + +- Follow the instructions to install the FVP to your desired location + +### Deploying on an FVP emulating MPS3 + +This section assumes that the FVP has been installed (see [Setting up the MPS3 Arm Corstone-300 FVP](#Setting-up-the-MPS3-Arm-Corstone-300-FVP)) to the user's home directory `~/FVP_Corstone_SSE-300_Ethos-U55`. + +The installation, typically, will have the executable under `~/FVP_Corstone_SSE-300_Ethos-U55/model/<OS>_<compiler-version>/` +directory. For the example below, we assume it to be `~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4`. + +To run a use case on the FVP, from the [Build directory](../sections/building.md#Create-a-build-directory): + +```commandline +~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-<use_case>.axf +telnetterminal0: Listening for serial connection on port 5000 +telnetterminal1: Listening for serial connection on port 5001 +telnetterminal2: Listening for serial connection on port 5002 +telnetterminal5: Listening for serial connection on port 5003 + + Ethos-U rev 0 --- Oct 13 2020 11:27:45 + (C) COPYRIGHT 2019-2020 Arm Limited + ALL RIGHTS RESERVED +``` + +This will also launch a telnet window with the sample application's standard output and error log entries containing +information about the pre-built application version, TensorFlow Lite Micro library version used, data type as well as +the input and output tensor sizes of the model compiled into the executable binary. + +After the application has started it outputs a menu and waits for the user input from telnet terminal. + +For example, the image classification use case can be started by: + +```commandline +~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-img_class.axf +``` + +The FVP supports many command line parameters: + +- passed by using `-C <param>=<value>`. The most important ones are: + - `ethosu.num_macs`: Sets the Ethos-U55 configuration for the model. Valid parameters are `32`, `64`, `256`, + and the default one `128`. The number signifies the 8x8 MACs performed per cycle count available on the hardware. + - `cpu0.CFGITCMSZ`: ITCM size for the Cortex-M CPU. Size of ITCM is pow(2, CFGITCMSZ - 1) KB + - `cpu0.CFGDTCMSZ`: DTCM size for the Cortex-M CPU. Size of DTCM is pow(2, CFGDTCMSZ - 1) KB + - `mps3_board.telnetterminal0.start_telnet` : Starts the telnet session if nothing connected. + - `mps3_board.uart0.out_file`: Sets the output file to hold data written by the UART + (use '-' to send all output to stdout, empty by default). + - `mps3_board.uart0.shutdown_on_eot`: Sets to shutdown simulation when a EOT (ASCII 4) char is transmitted. + - `mps3_board.visualisation.disable-visualisation`: Enables or disables visualisation (disabled by default). + + To start the model in `128` mode for Ethos-U55: + + ```commandline + ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-img_class.axf -C ethosu.num_macs=128 + ``` + +- `-l`: shows the full list of supported parameters + + ```commandline + ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -l + ``` + +- `--stat`: prints some run statistics on simulation exit + + ```commandline + ~/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 --stat + ``` + +- `--timelimit`: sets the number of wall clock seconds for the simulator to run, excluding startup and shutdown. + +## MPS3 board + +> **Note:** Before proceeding, make sure you have the MPS3 board powered on, +and USB A to B connected between your machine and the MPS3. +The connector on the MPS3 is marked as "Debug USB". + +![MPS3](../media/mps3.png) + +1. MPS3 board top view. + +Once the board has booted, the micro SD card will enumerate as a mass +storage device. On most systems this will be automatically mounted, but +you might need to mount it manually. + +Also, there should be four serial-over-USB ports available for use via +this connection. On Linux based machines, these would typically be +*/dev/ttyUSB\<n\>* to */dev/ttyUSB\<n+3\>*. + +The default configuration for all of them is 115200, 8/N/1 (15200 bauds, +8 bits, no parity and 1 stop bit) with no flow control. + +> **Note:** For Windows machines, additional FTDI drivers might need to be installed +for these serial ports to be available. +For more information on getting started with an MPS3 board, please refer to +<https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/MPS3GettingStarted.pdf> + +### Deployment on MPS3 board + +> **NOTE**: These instructions are valid only if the evaluation is being + done using the MPS3 FPGA platform using either `SSE-200` or `SSE-300`. + +To run the application on MPS3 platform, firstly it's necessary to make sure +that the platform has been set up using the correct configuration. +For details, on platform set up, please see the relevant documentation. For `Arm Corstone-300`, this is available +[here](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/DAI0547B_SSE300_PLUS_U55_FPGA_for_mps3.pdf?revision=d088d931-03c7-40e4-9045-31ed8c54a26f&la=en&hash=F0C7837C8ACEBC3A0CF02D871B3A6FF93E09C6B8). + +For MPS3 board, instead of loading the axf file directly, the executable blobs +generated under the *sectors/<use_case>* subdirectory need to be +copied over to the MP3 board's micro SD card. Also, every use case build +generates a corresponding images.txt file which is used by the MPS3 to +understand which memory regions the blobs are to be loaded into. + +Once the USB A <--> B cable between the MPS3 and the development machine +is connected and the MPS3 board powered on, the board should enumerate +as a mass storage device over this USB connection. +There might be two devices also, depending on the version of the board +you are using. The device named `V2M-MPS3` or `V2MMPS3` is the `SD card`. + +If the axf/elf file is within 1MiB, it can be flashed into the FPGA +memory directly without having to break it down into separate load +region specific blobs. However, with neural network models exceeding +this size, it becomes necessary to follow this approach. + +1. For example, the image classification use case will produce: + + ```tree + ./bin/sectors/ + └── img_class + ├── dram.bin + └── itcm.bin + ``` + + For example, if the micro SD card is mounted at + /media/user/V2M-MPS3/: + + ```commandline + cp -av ./bin/sectors/img_class/* /media/user/V2M-MPS3/SOFTWARE/ + ``` + +2. The generated `\<use-case\>_images.txt` file needs to be copied +over to the MPS3. The exact location for the destination will depend +on the MPS3 board's version and the application note for the bit +file in use. +For example, for MPS3 board hardware revision C, using an +application note directory named "ETHOSU", to replace the images.txt +file: + + ```commandline + cp ./bin/images-img_class.txt /media/user/V2M-MPS3/MB/HBI0309C/ETHOSU/images.txt + ``` + +3. Open the first serial port available from MPS3, for example, +"/dev/ttyUSB0". This can be typically done using minicom, screen or +Putty application. Make sure the flow control setting is switched +off. + + ```commandline + minicom --D /dev/ttyUSB0 + ``` + + ```log + Welcome to minicom 2.7.1 + OPTIONS: I18n + Compiled on Aug 13 2017, 15:25:34. + Port /dev/ttyUSB0, 16:05:34 + Press CTRL-A Z for help on special keys + Cmd> + ``` + +4. In another terminal, open the second serial port, for example, + "/dev/ttyUSB1": + + ```commandline + minicom --D /dev/ttyUSB1 + ``` + +5. On the first serial port, issue a "reboot" command and press the + return key + + ```commandline + $ Cmd> reboot + ``` + + ```log + Rebooting...Disabling debug USB..Board rebooting... + + ARM V2M-MPS3 Firmware v1.3.2 + Build Date: Apr 20 2018 + + Powering up system... + Switching on main power... + Configuring motherboard (rev C, var A)... + ``` + + This will go on to reboot the board and prime the application to run by + flashing the binaries into their respective FPGA memory locations. For example: + + ```log + Reading images file \MB\HBI0309C\ETHOSU\images.txt + Writing File \SOFTWARE\itcm.bin to Address 0x00000000 + + ............ + + File \SOFTWARE\itcm.bin written to memory address 0x00000000 + Image loaded from \SOFTWARE\itcm.bin + Writing File \SOFTWARE\dram.bin to Address 0x08000000 + + .......................................................................... + + + File \SOFTWARE\dram.bin written to memory address 0x08000000 + Image loaded from \SOFTWARE\dram.bin + ``` + +6. When the reboot from previous step is completed, issue a reset + command on the command prompt. + + ``` commandline + $ Cmd> reset + ``` + + This will trigger the application to start, and the output should be visible on the second serial connection. + +7. On the second serial port, output similar to section 2.2 should be visible: + + ```log + [INFO] Setting up system tick IRQ (for NPU) + [INFO] V2M-MPS3 revision C + [INFO] Application Note AN540, Revision B + [INFO] FPGA build 1 + [INFO] Core clock has been set to: 32000000 Hz + [INFO] CPU ID: 0x410fd220 + [INFO] CPU: Cortex-M55 r0p0 + ... + ``` + + +Next section of the main documentation, [Running code samples applications](../documentation.md#Running-code-samples-applications). diff --git a/docs/sections/run.md b/docs/sections/run.md new file mode 100644 index 0000000..90ee7c8 --- /dev/null +++ b/docs/sections/run.md @@ -0,0 +1,42 @@ + +# Running Ethos-U55 Code Samples + +- [Starting Fast Model simulation](#starting-fast-model-simulation) + +This section covers the process for getting started with pre-built binaries for the Code Samples. + +## Starting Fast Model simulation + +Once built application binaries and assuming the install location of the FVP +was set to ~/FVP_install_location, the simulation can be started by: + +```commandline +FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 +./bin/mps3-sse-300/ethos-u-<use_case>.axf +``` + +This will start the Fast Model simulation for the chosen use-case. + +A log output should appear on the terminal: + +```log +telnetterminal0: Listening for serial connection on port 5000 +telnetterminal1: Listening for serial connection on port 5001 +telnetterminal2: Listening for serial connection on port 5002 +telnetterminal5: Listening for serial connection on port 5003 +``` + +This will also launch a telnet window with the sample application's +standard output and error log entries containing information about the +pre-built application version, TensorFlow Lite Micro library version +used, data type as well as the input and output tensor sizes of the +model compiled into the executable binary. + +![FVP](../media/fvp.png) + +![FVP Terminal](../media/fvpterminal.png) + +> **Note:** +For details on the specific use-case follow the instructions in the corresponding documentation. + +Next section of the documentation: [Implementing custom ML application](../documentation.md#Implementing-custom-ML-application). diff --git a/docs/sections/testing_benchmarking.md b/docs/sections/testing_benchmarking.md new file mode 100644 index 0000000..43bb7f4 --- /dev/null +++ b/docs/sections/testing_benchmarking.md @@ -0,0 +1,87 @@ +# Testing and benchmarking + +- [Testing](#testing) +- [Benchmarking](#benchmarking) + +## Testing + +The `tests` folder has the following structure: + +```tree +. +├── common +│ └── ... +├── use_case +│ ├── <usecase1> +│ │ └── ... +│ ├── <usecase2> +│ │ └── ... +└── utils + └── ... +``` + +Where: + +- `common`: contains tests for generic and common appplication functions. +- `use_case`: contains all the use case specific tests in the respective folders. +- `utils`: contains utilities sources used only within the tests. + +When [configuring](./building.md#configuring-the-build-native-unit-test) and +[building](./building.md#Building-the-configured-project) for `native` target platform results of the build will +be placed under `build/bin/` folder, for example: + +```tree +. +├── dev_ethosu_eval-<usecase1>-tests +├── dev_ethosu_eval-<usecase2>-tests +├── ethos-u-<usecase1> +└── ethos-u-<usecase1> +``` + +To execute unit-tests for a specific use-case in addition to the common tests: + +```commandline +dev_ethosu_eval-<use_case>-tests +``` + +```log +[INFO] native platform initialised +[INFO] ARM Ethos-U55 Evaluation application for MPS3 FPGA Prototyping Board and FastModel + +... +=============================================================================== + All tests passed (37 assertions in 7 test cases) +``` + +Tests output could have `[ERROR]` messages, that's alright - they are coming from negative scenarios tests. + +## Benchmarking + +Profiling is enabled by default when configuring the project. This will enable displaying: + +- the active and idle NPU cycle counts when Arm® Ethos™-U55 is enabled (see `-DETHOS_U55_ENABLED` in + [Build options](./building.md#build-options). +- CPU cycle counts and/or in milliseconds elapsed for inferences performed if CPU profiling is enabled + (see `-DCPU_PROFILE_ENABLED` in [Build options](./building.md#build-options). This should be done only + when running on a physical FPGA board as the FVP does not contain a cycle-approximate or cycle-accurate Cortex-M model. + +For example: + +- On the FVP: + +```log + Active NPU cycles: 5475412 + Idle NPU cycles: 702 +``` + +- For MPS3 platform, the time duration in milliseconds is also reported when `-DCPU_PROFILE_ENABLED=1` is added to + CMake configuration command: + +```log + Active NPU cycles: 5629033 + Idle NPU cycles: 1005276 + Active CPU cycles: 993553 (approx) + Time in ms: 210 +``` + +Next section of the main documentation: [Troubleshooting](../documentation.md#Troubleshooting). diff --git a/docs/sections/troubleshooting.md b/docs/sections/troubleshooting.md new file mode 100644 index 0000000..40b975a --- /dev/null +++ b/docs/sections/troubleshooting.md @@ -0,0 +1,27 @@ +# Troubleshooting + +- [Inference results are incorrect for my custom files](#inference-results-are-incorrect-for-my-custom-files) +- [The application does not work with my custom model](#the-application-does-not-work-with-my-custom-model) + +## Inference results are incorrect for my custom files + +Ensure that the files you are using match the requirements of the model +you are using and that cmake parameters are set accordingly. More +information on these cmake parameters is detailed in their separate +sections. Note that preprocessing of the files could also affect the +inference result, such as the rescaling and padding operations done for +image classification. + +## The application does not work with my custom model + +Ensure that your model is in a fully quantized `.tflite` file format, +either uint8 or int8, and has successfully been run through the Vela +compiler. + +Check that cmake parameters match your new models input requirements. + +> **Note:** Vela tool is not available within this software project. +It is a python tool available from <https://pypi.org/project/ethos-u-vela/>. +The source code is hosted on <https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/>. + +Next section of the documentation: [Contribution guidelines](../documentation.md#Contribution-guidelines). |