From a08e9d43eee72bb3143c9dad304c966e700be810 Mon Sep 17 00:00:00 2001 From: Kristofer Jonsson Date: Wed, 28 Apr 2021 12:32:28 +0200 Subject: Documenting startup and multi NPU Adding documentation and sequence diagrams for startup and multi NPU. Change-Id: I4a4a43e8bea089b6325f7d8285434017cbda25ec --- README.md | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index 046f406..b20718a 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,101 @@ this. $ FVP_Corstone_SSE-300_Ethos-U55 applications/freertos/freertos.elf ``` +# Multi NPU + +The Tensorflow Lite for Microcontrollers (TFLu) framework supports running +multiple parallel inferences. Each parallel inference requires a TFLu arena +(costs memory) and a stack (requires an RTOS). The examples provided in this +repo are implemented in the application layer, which means that any RTOS could +be used. + +The Ethos-U NPU driver is implemented in plain C. To enable thread safety in a +multi-threading environment the driver defines a set of weak functions that the +application is expected to override, providing implementations for mutex and +semaphore primitives. + +The weak function can be found in +[ethosu_driver.c](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c?id=35b5d0eebf9709a3439d362a0b53d6270cbc4a94#n173). +An example based on FreeRTOS how to override and implement these functions can +be found in +[applications/freertos/main.cpp](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/applications/freertos/main.cpp?id=991af2bd8fb6c79dfb317837353857f34a727b17#n108). + +The sequence diagram below illustrates the call stack for a multi NPU system. +Please note how the `ethosu_mutex_*` and `ethosu_semaphore_*` functions are +implemented in the application layer. Mutexes are used for thread safety and +semaphores for sleeping. + +![Multi NPU](docs/multinpu.svg "Multi NPU sequence diagram") + +## Tradeoffs + +A single Cortex-M is capable of driving multiple Ethos-U. What the optimal +number of Ethos-U is, that is impossible to tell without knowing which network +to run or without detailed knowledge about the limitations of the embedded +system. + +Each parallel inference requires an arena. The arena should for optimal +performance be placed in a high bandwidth low latency memory like SRAM, which is +a cost that has to be considered. The size of the arena varies greatly depending +on the network. + +For networks that map fully to Ethos-U, the memory bandwidth might become a +limiting factor. For networks that run partly in software, the Cortex-M might +become the limiting factor. The placement of the TFLu model and arena (flash, +DRAM, SRAM, etc) will also have a big impact on the performance. + +# Startup + +The applications in this repo use +[CMSIS Device](https://github.com/ARM-software/CMSIS_5/tree/develop/Device/) to +startup the Cortex-M. The standard procedure is to copy and modify the CMSIS +templates, but in this repo we have chosen to include the unmodified templates +directly from CMSIS. + +The sequence diagram below describes what happens after the Cortex-M reset is +lifted, up until the execution enters the application `main()`. + +![Startup](docs/startup.svg "Startup sequence diagram") + +## CMSIS Device + +First thing that happens is that the CPU loads index 0 from the interrupt vector +into the SP register and index 1 into the PC register, and then starts executing +from the PC location. + +Index 1 in the VTOR is referred to as the reset handler and is resposible for +initializing the CPU. If the CPU for example has a FPU or MVE extension, then +these are enabled. + +## Compiler runtime + +The entry function for the compiler runtime setup varies depending on which +compiler that is used. For Arm Clang this function is called `__main()`, not to +be confused with the application `main()`! + +The runtime is responsible for initializing the memory segments and setting up +the runtime environment. Please refer to the compiler documentation for detailed +information about the runtime setup. + +## Target + +The [`init()`](targets/common/src/init.cpp) is defined as a constructor, which +will be called before the application `main()`. We use this constructor to run +`targetSetup()` to initialize the platform. + +For each target there is a `targets/` directory, which contains linker +scripts and code needed to setup the target. `targetSetup()` is implemented in +this folder and is responsible for initializing drivers, configuring the MPU, +enabling caches etc. + +Adding a new target would involve creating a new `targets/` directory, +providing linker scripts and implementing `targetSetup()`. + +## Application + +Finally the runtime calls application `main()`. Ideally the application code +should be generic and have no knowledge about which target it is executing on. + # License The Arm Ethos-U core platform is provided under an Apache-2.0 license. Please -- cgit v1.2.1