From a08e9d43eee72bb3143c9dad304c966e700be810 Mon Sep 17 00:00:00 2001 From: Kristofer Jonsson Date: Wed, 28 Apr 2021 12:32:28 +0200 Subject: Documenting startup and multi NPU Adding documentation and sequence diagrams for startup and multi NPU. Change-Id: I4a4a43e8bea089b6325f7d8285434017cbda25ec --- README.md | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ docs/multinpu.puml | 82 ++++++++++++++++++++++++++++++++++++++++++++++ docs/multinpu.svg | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ docs/startup.puml | 71 ++++++++++++++++++++++++++++++++++++++++ docs/startup.svg | 84 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 427 insertions(+) create mode 100644 docs/multinpu.puml create mode 100644 docs/multinpu.svg create mode 100644 docs/startup.puml create mode 100644 docs/startup.svg diff --git a/README.md b/README.md index 046f406..b20718a 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,101 @@ this. $ FVP_Corstone_SSE-300_Ethos-U55 applications/freertos/freertos.elf ``` +# Multi NPU + +The Tensorflow Lite for Microcontrollers (TFLu) framework supports running +multiple parallel inferences. Each parallel inference requires a TFLu arena +(costs memory) and a stack (requires an RTOS). The examples provided in this +repo are implemented in the application layer, which means that any RTOS could +be used. + +The Ethos-U NPU driver is implemented in plain C. To enable thread safety in a +multi-threading environment the driver defines a set of weak functions that the +application is expected to override, providing implementations for mutex and +semaphore primitives. + +The weak function can be found in +[ethosu_driver.c](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c?id=35b5d0eebf9709a3439d362a0b53d6270cbc4a94#n173). +An example based on FreeRTOS how to override and implement these functions can +be found in +[applications/freertos/main.cpp](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/applications/freertos/main.cpp?id=991af2bd8fb6c79dfb317837353857f34a727b17#n108). + +The sequence diagram below illustrates the call stack for a multi NPU system. +Please note how the `ethosu_mutex_*` and `ethosu_semaphore_*` functions are +implemented in the application layer. Mutexes are used for thread safety and +semaphores for sleeping. + +![Multi NPU](docs/multinpu.svg "Multi NPU sequence diagram") + +## Tradeoffs + +A single Cortex-M is capable of driving multiple Ethos-U. What the optimal +number of Ethos-U is, that is impossible to tell without knowing which network +to run or without detailed knowledge about the limitations of the embedded +system. + +Each parallel inference requires an arena. The arena should for optimal +performance be placed in a high bandwidth low latency memory like SRAM, which is +a cost that has to be considered. The size of the arena varies greatly depending +on the network. + +For networks that map fully to Ethos-U, the memory bandwidth might become a +limiting factor. For networks that run partly in software, the Cortex-M might +become the limiting factor. The placement of the TFLu model and arena (flash, +DRAM, SRAM, etc) will also have a big impact on the performance. + +# Startup + +The applications in this repo use +[CMSIS Device](https://github.com/ARM-software/CMSIS_5/tree/develop/Device/) to +startup the Cortex-M. The standard procedure is to copy and modify the CMSIS +templates, but in this repo we have chosen to include the unmodified templates +directly from CMSIS. + +The sequence diagram below describes what happens after the Cortex-M reset is +lifted, up until the execution enters the application `main()`. + +![Startup](docs/startup.svg "Startup sequence diagram") + +## CMSIS Device + +First thing that happens is that the CPU loads index 0 from the interrupt vector +into the SP register and index 1 into the PC register, and then starts executing +from the PC location. + +Index 1 in the VTOR is referred to as the reset handler and is resposible for +initializing the CPU. If the CPU for example has a FPU or MVE extension, then +these are enabled. + +## Compiler runtime + +The entry function for the compiler runtime setup varies depending on which +compiler that is used. For Arm Clang this function is called `__main()`, not to +be confused with the application `main()`! + +The runtime is responsible for initializing the memory segments and setting up +the runtime environment. Please refer to the compiler documentation for detailed +information about the runtime setup. + +## Target + +The [`init()`](targets/common/src/init.cpp) is defined as a constructor, which +will be called before the application `main()`. We use this constructor to run +`targetSetup()` to initialize the platform. + +For each target there is a `targets/` directory, which contains linker +scripts and code needed to setup the target. `targetSetup()` is implemented in +this folder and is responsible for initializing drivers, configuring the MPU, +enabling caches etc. + +Adding a new target would involve creating a new `targets/` directory, +providing linker scripts and implementing `targetSetup()`. + +## Application + +Finally the runtime calls application `main()`. Ideally the application code +should be generic and have no knowledge about which target it is executing on. + # License The Arm Ethos-U core platform is provided under an Apache-2.0 license. Please diff --git a/docs/multinpu.puml b/docs/multinpu.puml new file mode 100644 index 0000000..e5248b1 --- /dev/null +++ b/docs/multinpu.puml @@ -0,0 +1,82 @@ +@startuml + +skinparam backgroundColor #EEEBDC + +box "Application" #00C1DE +participant "main()" as app +end box + +box "Tensorflow" #FF6B00 +participant "TFLu" as tflu +participant "Ethos-U custom op" as custom +end box + +box "Ethos-U driver" #95D600 +participant "Driver" as driver +end box + +box "Hardware" #FFC700 +participant "Cortex-M" as cortexm +participant "Ethos-U" as ethosu +end box + +app -> tflu++: Invoke() + tflu -> custom++: Eval() + custom -> driver++: ethosu_reserve_driver() + loop Find and reserve driver + driver -> app++: ethosu_mutex_lock() + return + + driver -> driver: ethosu_find_and_reserve_driver() + + driver -> app++: ethosu_mutex_unlock() + return + + alt Found free driver + note over driver + Return free driver + end note + else No driver available + driver -> app++: ethosu_semaphore_take() + note over app + Block on semaphore + end note + return + end + end loop + return + + custom -> driver++: ethosu_invoke() + driver -\\ ethosu: Configure NPU and trigger inference + + driver -> driver++: wait_for_irq() + note over driver + Driver sleeping waiting for IRQ + end note + + ethosu -\\ cortexm: IRQ + cortexm -\\ driver: ethosu_irq_handler() + + note over driver + Driver woken up by IRQ handler + end note + return + return + + custom -> driver++: ethosu_release_driver() + driver -> app++: ethosu_mutex_lock() + return + + driver -> app++: ethosu_semaphore_give() + note over app + Wake up threads blocking on the semaphore + end note + return + + driver -> app++: ethosu_mutex_unlock() + return + return + return +return + +@enduml diff --git a/docs/multinpu.svg b/docs/multinpu.svg new file mode 100644 index 0000000..bc3b6b0 --- /dev/null +++ b/docs/multinpu.svg @@ -0,0 +1,95 @@ +ApplicationTensorflowEthos-U driverHardwaremain()main()TFLuTFLuEthos-U custom opEthos-U custom opDriverDriverCortex-MCortex-MEthos-UEthos-UInvoke()Eval()ethosu_reserve_driver()loop[Find and reserve driver]ethosu_mutex_lock()ethosu_find_and_reserve_driver()ethosu_mutex_unlock()alt[Found free driver]Return free driver[No driver available]ethosu_semaphore_take()Block on semaphoreethosu_invoke()Configure NPU and trigger inferencewait_for_irq()Driver sleeping waiting for IRQIRQethosu_irq_handler()Driver woken up by IRQ handlerethosu_release_driver()ethosu_mutex_lock()ethosu_semaphore_give()Wake up threads blocking on the semaphoreethosu_mutex_unlock() \ No newline at end of file diff --git a/docs/startup.puml b/docs/startup.puml new file mode 100644 index 0000000..f9cb528 --- /dev/null +++ b/docs/startup.puml @@ -0,0 +1,71 @@ +@startuml + +skinparam backgroundColor #EEEBDC + +box "Hardware" #FFC700 +participant "Cortex-M" as cortexm +participant "Ethos-U" as ethosu +end box + +box "CMSIS Device" #0091BD +participant "__VECTOR_TABLE" as ivec +participant "Reset_Handler()" as reset +end box + +box "Compiler" #FF6B00 +participant "Runtime" as runtime +end box + +box "Target" #95D600 +participant "common" as common +participant "corstone-300" as target +end box + +box "Drivers" #00C1DE +participant "NPU" as driver +participant "UART" as uart +participant "MPU" as mpu +end box + +box "Application" #7D868C +participant "main()" as main +end box + +cortexm -> ivec++: + ivec -> reset++: Reset_Handler() + reset -> reset++: SystemInit() + deactivate reset + + reset -> runtime++: __main() + note over runtime + Scatter loading + Initializing stack and heap + end note + + note over runtime + Calling constructors + end note + + runtime -> common++: init() [constructor] + note over common + The constructor is called after stack and heap have been initialized, + but before the main() function is called + end note + + common -> target++: targetSetup() + target -> uart++: uart_init() + return + + target -> driver++: ethosu_init() + return + + target -> mpu++: loadAndEnableConfig() + return + return + return + + runtime -> main++: main() + note over main + Running application + end note +@enduml diff --git a/docs/startup.svg b/docs/startup.svg new file mode 100644 index 0000000..a2c9f52 --- /dev/null +++ b/docs/startup.svg @@ -0,0 +1,84 @@ +HardwareCMSIS DeviceCompilerTargetDriversApplicationCortex-MCortex-MEthos-UEthos-U__VECTOR_TABLE__VECTOR_TABLEReset_Handler()Reset_Handler()RuntimeRuntimecommoncommoncorstone-300corstone-300NPUNPUUARTUARTMPUMPUmain()main()Reset_Handler()SystemInit()__main()Scatter loadingInitializing stack and heapCalling constructorsinit() [constructor]The constructor is called after stack and heap have been initialized,but before the main() function is calledtargetSetup()uart_init()ethosu_init()loadAndEnableConfig()main()Running application \ No newline at end of file -- cgit v1.2.1