diff options
-rw-r--r-- | README.md | 49 |
1 files changed, 48 insertions, 1 deletions
@@ -7,7 +7,7 @@ inference on an Arm Ethos-U compatible platform. This repository contains target specific files, like linker scripts. Target agnostic software components are provided in the -[core_software](https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-core-software) +[core_software](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-software.git) repository. # Targets @@ -117,6 +117,53 @@ be written to 0x02000000. Power up the board with the PBON and the application output will be seen on the serial console. +# Memory configurations + +Embedded systems come in very different configurations, but typically they have +a limited amount of high bandwidth low latency memory like SRAM, and some more +low bandwidth high latency memory like flash or DRAM. + +The Tensorflow Lite for Microcontrollers (TFLu) framework needs two buffers to +run an inference, the *model* and the *arena*. The model contains static data +like weights and biases. The arena contains read write data like activations, +IFM, OFM, temporary data etc. Please note that the IFM and OFM are located +*inside* of the arena. + +The placement of the model and arena has a big impact on the performance. There +are three configurations that make sense for most systems. + +| Model | Arena | Spilling | Note | +|------------|------------|----------|----------------| +| SRAM | SRAM | No | | +| Flash/DRAM | SRAM | No | | +| Flash/DRAM | Flash/DRAM | Yes | Ethos-U65 only | + +## Model and arena in SRAM + +For optimal performance both model and arena should be placed in SRAM. + +## Model flash/DRAM, Arena SRAM + +If both model and arena do not fit in SRAM, then it makes most sense to move the +model to flash/DRAM. The performance penalty depends on the network and will +need to be measured. For example weight bound networks will experience a larger +performance drop than MAC bound networks. + +## Model and arena in flash/DRAM (Ethos-U65 only) + +Moving both model and arena to flash/DRAM comes with quite a hefty performance +penalty. To mitigate some of this *spilling* can be used. + +Spilling means that a small buffer is reserved in SRAM that acts like a cache +for frequently accessed data. When spilling is enabled +[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will +prepend and append extra instructions to the command stream to DMA copy data +between the arena and the spilling buffer. + +Some of the data stored in the spilling buffer must be copied back to the arena, +which is done as DMA transfer over AXI 1. This is only supported by Ethos-U65, +because Ethos-U55 is equipped with a readonly AXI 1 interface. + # Multi NPU The Tensorflow Lite for Microcontrollers (TFLu) framework supports running |