aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKristofer Jonsson <kristofer.jonsson@arm.com>2021-09-09 09:47:21 +0200
committerKristofer Jonsson <kristofer.jonsson@arm.com>2021-09-09 11:01:48 +0200
commitff2084bc16c91ec71820785ed4f5018886375549 (patch)
tree5f3fcc77075500405d42165e4e9099cbec4ad90b
parentce05c41cec3ec68460f377dd63b567b60f070527 (diff)
downloadethos-u-core-platform-ff2084bc16c91ec71820785ed4f5018886375549.tar.gz
Document memory configurations
Change-Id: I165651921106acb6893750dfeabec7537188c223
-rw-r--r--README.md49
1 files changed, 48 insertions, 1 deletions
diff --git a/README.md b/README.md
index fb93008..65fcaf9 100644
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@ inference on an Arm Ethos-U compatible platform.
This repository contains target specific files, like linker scripts. Target
agnostic software components are provided in the
-[core_software](https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-core-software)
+[core_software](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-software.git)
repository.
# Targets
@@ -117,6 +117,53 @@ be written to 0x02000000.
Power up the board with the PBON and the application output will be seen on the
serial console.
+# Memory configurations
+
+Embedded systems come in very different configurations, but typically they have
+a limited amount of high bandwidth low latency memory like SRAM, and some more
+low bandwidth high latency memory like flash or DRAM.
+
+The Tensorflow Lite for Microcontrollers (TFLu) framework needs two buffers to
+run an inference, the *model* and the *arena*. The model contains static data
+like weights and biases. The arena contains read write data like activations,
+IFM, OFM, temporary data etc. Please note that the IFM and OFM are located
+*inside* of the arena.
+
+The placement of the model and arena has a big impact on the performance. There
+are three configurations that make sense for most systems.
+
+| Model | Arena | Spilling | Note |
+|------------|------------|----------|----------------|
+| SRAM | SRAM | No | |
+| Flash/DRAM | SRAM | No | |
+| Flash/DRAM | Flash/DRAM | Yes | Ethos-U65 only |
+
+## Model and arena in SRAM
+
+For optimal performance both model and arena should be placed in SRAM.
+
+## Model flash/DRAM, Arena SRAM
+
+If both model and arena do not fit in SRAM, then it makes most sense to move the
+model to flash/DRAM. The performance penalty depends on the network and will
+need to be measured. For example weight bound networks will experience a larger
+performance drop than MAC bound networks.
+
+## Model and arena in flash/DRAM (Ethos-U65 only)
+
+Moving both model and arena to flash/DRAM comes with quite a hefty performance
+penalty. To mitigate some of this *spilling* can be used.
+
+Spilling means that a small buffer is reserved in SRAM that acts like a cache
+for frequently accessed data. When spilling is enabled
+[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will
+prepend and append extra instructions to the command stream to DMA copy data
+between the arena and the spilling buffer.
+
+Some of the data stored in the spilling buffer must be copied back to the arena,
+which is done as DMA transfer over AXI 1. This is only supported by Ethos-U65,
+because Ethos-U55 is equipped with a readonly AXI 1 interface.
+
# Multi NPU
The Tensorflow Lite for Microcontrollers (TFLu) framework supports running