From ff2084bc16c91ec71820785ed4f5018886375549 Mon Sep 17 00:00:00 2001 From: Kristofer Jonsson Date: Thu, 9 Sep 2021 09:47:21 +0200 Subject: Document memory configurations Change-Id: I165651921106acb6893750dfeabec7537188c223 --- README.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index fb93008..65fcaf9 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ inference on an Arm Ethos-U compatible platform. This repository contains target specific files, like linker scripts. Target agnostic software components are provided in the -[core_software](https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-core-software) +[core_software](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-software.git) repository. # Targets @@ -117,6 +117,53 @@ be written to 0x02000000. Power up the board with the PBON and the application output will be seen on the serial console. +# Memory configurations + +Embedded systems come in very different configurations, but typically they have +a limited amount of high bandwidth low latency memory like SRAM, and some more +low bandwidth high latency memory like flash or DRAM. + +The Tensorflow Lite for Microcontrollers (TFLu) framework needs two buffers to +run an inference, the *model* and the *arena*. The model contains static data +like weights and biases. The arena contains read write data like activations, +IFM, OFM, temporary data etc. Please note that the IFM and OFM are located +*inside* of the arena. + +The placement of the model and arena has a big impact on the performance. There +are three configurations that make sense for most systems. + +| Model | Arena | Spilling | Note | +|------------|------------|----------|----------------| +| SRAM | SRAM | No | | +| Flash/DRAM | SRAM | No | | +| Flash/DRAM | Flash/DRAM | Yes | Ethos-U65 only | + +## Model and arena in SRAM + +For optimal performance both model and arena should be placed in SRAM. + +## Model flash/DRAM, Arena SRAM + +If both model and arena do not fit in SRAM, then it makes most sense to move the +model to flash/DRAM. The performance penalty depends on the network and will +need to be measured. For example weight bound networks will experience a larger +performance drop than MAC bound networks. + +## Model and arena in flash/DRAM (Ethos-U65 only) + +Moving both model and arena to flash/DRAM comes with quite a hefty performance +penalty. To mitigate some of this *spilling* can be used. + +Spilling means that a small buffer is reserved in SRAM that acts like a cache +for frequently accessed data. When spilling is enabled +[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will +prepend and append extra instructions to the command stream to DMA copy data +between the arena and the spilling buffer. + +Some of the data stored in the spilling buffer must be copied back to the arena, +which is done as DMA transfer over AXI 1. This is only supported by Ethos-U65, +because Ethos-U55 is equipped with a readonly AXI 1 interface. + # Multi NPU The Tensorflow Lite for Microcontrollers (TFLu) framework supports running -- cgit v1.2.1