From ff2084bc16c91ec71820785ed4f5018886375549 Mon Sep 17 00:00:00 2001
From: Kristofer Jonsson <kristofer.jonsson@arm.com>
Date: Thu, 9 Sep 2021 09:47:21 +0200
Subject: Document memory configurations

Change-Id: I165651921106acb6893750dfeabec7537188c223
---
 README.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index fb93008..65fcaf9 100644
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@ inference on an Arm Ethos-U compatible platform.
 
 This repository contains target specific files, like linker scripts. Target
 agnostic software components are provided in the
-[core_software](https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-core-software)
+[core_software](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-software.git)
 repository.
 
 # Targets
@@ -117,6 +117,53 @@ be written to 0x02000000.
 Power up the board with the PBON and the application output will be seen on the
 serial console.
 
+# Memory configurations
+
+Embedded systems come in very different configurations, but typically they have
+a limited amount of high bandwidth low latency memory like SRAM, and some more
+low bandwidth high latency memory like flash or DRAM.
+
+The Tensorflow Lite for Microcontrollers (TFLu) framework needs two buffers to
+run an inference, the *model* and the *arena*. The model contains static data
+like weights and biases. The arena contains read write data like activations,
+IFM, OFM, temporary data etc. Please note that the IFM and OFM are located
+*inside* of the arena.
+
+The placement of the model and arena has a big impact on the performance. There
+are three configurations that make sense for most systems.
+
+| Model      | Arena      | Spilling | Note           |
+|------------|------------|----------|----------------|
+| SRAM       | SRAM       | No       |                |
+| Flash/DRAM | SRAM       | No       |                |
+| Flash/DRAM | Flash/DRAM | Yes      | Ethos-U65 only |
+
+## Model and arena in SRAM
+
+For optimal performance both model and arena should be placed in SRAM.
+
+## Model flash/DRAM, Arena SRAM
+
+If both model and arena do not fit in SRAM, then it makes most sense to move the
+model to flash/DRAM. The performance penalty depends on the network and will
+need to be measured. For example weight bound networks will experience a larger
+performance drop than MAC bound networks.
+
+## Model and arena in flash/DRAM (Ethos-U65 only)
+
+Moving both model and arena to flash/DRAM comes with quite a hefty performance
+penalty. To mitigate some of this *spilling* can be used.
+
+Spilling means that a small buffer is reserved in SRAM that acts like a cache
+for frequently accessed data. When spilling is enabled
+[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will
+prepend and append extra instructions to the command stream to DMA copy data
+between the arena and the spilling buffer.
+
+Some of the data stored in the spilling buffer must be copied back to the arena,
+which is done as DMA transfer over AXI 1. This is only supported by Ethos-U65,
+because Ethos-U55 is equipped with a readonly AXI 1 interface.
+
 # Multi NPU
 
 The Tensorflow Lite for Microcontrollers (TFLu) framework supports running
-- 
cgit v1.2.1