aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md95
1 files changed, 95 insertions, 0 deletions
diff --git a/README.md b/README.md
index 046f406..b20718a 100644
--- a/README.md
+++ b/README.md
@@ -59,6 +59,101 @@ this.
$ FVP_Corstone_SSE-300_Ethos-U55 applications/freertos/freertos.elf
```
+# Multi NPU
+
+The Tensorflow Lite for Microcontrollers (TFLu) framework supports running
+multiple parallel inferences. Each parallel inference requires a TFLu arena
+(costs memory) and a stack (requires an RTOS). The examples provided in this
+repo are implemented in the application layer, which means that any RTOS could
+be used.
+
+The Ethos-U NPU driver is implemented in plain C. To enable thread safety in a
+multi-threading environment the driver defines a set of weak functions that the
+application is expected to override, providing implementations for mutex and
+semaphore primitives.
+
+The weak function can be found in
+[ethosu_driver.c](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c?id=35b5d0eebf9709a3439d362a0b53d6270cbc4a94#n173).
+An example based on FreeRTOS how to override and implement these functions can
+be found in
+[applications/freertos/main.cpp](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/applications/freertos/main.cpp?id=991af2bd8fb6c79dfb317837353857f34a727b17#n108).
+
+The sequence diagram below illustrates the call stack for a multi NPU system.
+Please note how the `ethosu_mutex_*` and `ethosu_semaphore_*` functions are
+implemented in the application layer. Mutexes are used for thread safety and
+semaphores for sleeping.
+
+![Multi NPU](docs/multinpu.svg "Multi NPU sequence diagram")
+
+## Tradeoffs
+
+A single Cortex-M is capable of driving multiple Ethos-U. What the optimal
+number of Ethos-U is, that is impossible to tell without knowing which network
+to run or without detailed knowledge about the limitations of the embedded
+system.
+
+Each parallel inference requires an arena. The arena should for optimal
+performance be placed in a high bandwidth low latency memory like SRAM, which is
+a cost that has to be considered. The size of the arena varies greatly depending
+on the network.
+
+For networks that map fully to Ethos-U, the memory bandwidth might become a
+limiting factor. For networks that run partly in software, the Cortex-M might
+become the limiting factor. The placement of the TFLu model and arena (flash,
+DRAM, SRAM, etc) will also have a big impact on the performance.
+
+# Startup
+
+The applications in this repo use
+[CMSIS Device](https://github.com/ARM-software/CMSIS_5/tree/develop/Device/) to
+startup the Cortex-M. The standard procedure is to copy and modify the CMSIS
+templates, but in this repo we have chosen to include the unmodified templates
+directly from CMSIS.
+
+The sequence diagram below describes what happens after the Cortex-M reset is
+lifted, up until the execution enters the application `main()`.
+
+![Startup](docs/startup.svg "Startup sequence diagram")
+
+## CMSIS Device
+
+First thing that happens is that the CPU loads index 0 from the interrupt vector
+into the SP register and index 1 into the PC register, and then starts executing
+from the PC location.
+
+Index 1 in the VTOR is referred to as the reset handler and is resposible for
+initializing the CPU. If the CPU for example has a FPU or MVE extension, then
+these are enabled.
+
+## Compiler runtime
+
+The entry function for the compiler runtime setup varies depending on which
+compiler that is used. For Arm Clang this function is called `__main()`, not to
+be confused with the application `main()`!
+
+The runtime is responsible for initializing the memory segments and setting up
+the runtime environment. Please refer to the compiler documentation for detailed
+information about the runtime setup.
+
+## Target
+
+The [`init()`](targets/common/src/init.cpp) is defined as a constructor, which
+will be called before the application `main()`. We use this constructor to run
+`targetSetup()` to initialize the platform.
+
+For each target there is a `targets/<target>` directory, which contains linker
+scripts and code needed to setup the target. `targetSetup()` is implemented in
+this folder and is responsible for initializing drivers, configuring the MPU,
+enabling caches etc.
+
+Adding a new target would involve creating a new `targets/<target>` directory,
+providing linker scripts and implementing `targetSetup()`.
+
+## Application
+
+Finally the runtime calls application `main()`. Ideally the application code
+should be generic and have no knowledge about which target it is executing on.
+
# License
The Arm Ethos-U core platform is provided under an Apache-2.0 license. Please