aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md66
1 files changed, 56 insertions, 10 deletions
diff --git a/README.md b/README.md
index f7ba476..7b922a6 100644
--- a/README.md
+++ b/README.md
@@ -203,7 +203,7 @@ Embedded systems come in very different configurations, but typically they have
a limited amount of high bandwidth low latency memory like SRAM, and some more
low bandwidth high latency memory like flash or DRAM.
-The Tensorflow Lite for Microcontrollers (TFLu) framework needs two buffers to
+The Tensorflow Lite for Microcontrollers (TFLM) framework needs two buffers to
run an inference, the *model* and the *arena*. The model contains static data
like weights and biases. The arena contains read write data like activations,
IFM, OFM, temporary data etc. Please note that the IFM and OFM are located
@@ -218,11 +218,15 @@ are three configurations that make sense for most systems.
| Flash/DRAM | SRAM | No | |
| Flash/DRAM | Flash/DRAM | Yes | Ethos-U65 only |
+Spilling is only available for Ethos-U65 and means that the TFLM model and arena
+are placed in slower memory like flash or DRAM, with a smaller *fast memory*
+buffer placed in faster memory like SRAM.
+
## Model and arena in SRAM
For optimal performance both model and arena should be placed in SRAM.
-## Model flash/DRAM, Arena SRAM
+## Model in flash/DRAM, arena in SRAM
If both model and arena do not fit in SRAM, then it makes most sense to move the
model to flash/DRAM. The performance penalty depends on the network and will
@@ -236,18 +240,60 @@ penalty. To mitigate some of this *spilling* can be used.
Spilling means that a small buffer is reserved in SRAM that acts like a cache
for frequently accessed data. When spilling is enabled
-[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will
-prepend and append extra instructions to the command stream to DMA copy data
-between the arena and the spilling buffer.
+[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will add
+extra instructions to the command stream to DMA copy data between the arena and
+the spilling buffer.
Some of the data stored in the spilling buffer must be copied back to the arena,
-which is done as DMA transfer over AXI 1. This is only supported by Ethos-U65,
-because Ethos-U55 is equipped with a readonly AXI 1 interface.
+which is done as DMA transfer over AXI interface 1. This is only supported by
+Ethos-U65, because Ethos-U55 is equipped with a readonly AXI 1 interface.
+
+## NPU region configuration
+
+To achieve good performance it is important not to mix slow and fast memory
+transactions on the same AXI interface. The default setup is to map the arena to
+AXI interface 0 and the model to AXI interface 1.
+
+However, if spilling is used the arena should be moved to AXI interface 1 and
+the spilling buffer routed over interface 0. This will ensure that slow memory
+transactions are routed over AXI interface 1 and fast memory transactions over
+AXI interface 0.
+
+The routing of the arena is controlled by define `NPU_REGIONCFG_1`. This define
+is declared in
+[ethosu_config_u55.h](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_config_u55.h)
+or
+[ethosu_config_u65.h](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_config_u65.h)
+depending on which NPU architecture the driver is built for. The default
+configuration is protected by `#ifdef` that can be overridden by the build
+system. Values `0` and `1` map to AXI interface 0 and `2` and `3` map to AXI
+interface 1.
+
+The routing of the model is controlled by `NPU_REGIONCFG_0` and the spilling
+buffer by `NPU_REGIONCFG_2`. For most use cases these configurations should not
+need to be changed.
+
+## NPU burst length
+
+The NPU issues DMA bursts over the AXI interfaces. The burst length is defined
+by the `AXI_LIMIT<nr>_MAX_BEATS_BYTES` in
+[ethosu_config_u55.h](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_config_u55.h)
+or
+[ethosu_config_u65.h](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_config_u65.h)
+and is by default set to its minimum value.
+
+Longer burst lengths will in general yield higher performance. However, burst
+lengths that exceed the maximum supported burst length risk hanging the AXI bus,
+so it is important to configure this value correctly.
+
+# Porting target
+
+Please see [PORTING.md](PORTING.md).
# Multi NPU
-The Tensorflow Lite for Microcontrollers (TFLu) framework supports running
-multiple parallel inferences. Each parallel inference requires a TFLu arena
+The Tensorflow Lite for Microcontrollers (TFLM) framework supports running
+multiple parallel inferences. Each parallel inference requires a TFLM arena
(costs memory) and a stack (requires an RTOS). The examples provided in this
repo are implemented in the application layer, which means that any RTOS could
be used.
@@ -284,7 +330,7 @@ on the network.
For networks that map fully to Ethos-U, the memory bandwidth might become a
limiting factor. For networks that run partly in software, the Cortex-M might
-become the limiting factor. The placement of the TFLu model and arena (flash,
+become the limiting factor. The placement of the TFLM model and arena (flash,
DRAM, SRAM, etc) will also have a big impact on the performance.
# Startup