aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDwight Lidman <dwight.lidman@arm.com>2021-11-17 17:21:00 +0100
committertim.hall <tim.hall@arm.com>2021-11-26 13:59:29 +0000
commit3a1cfda8343e5a1e7c4a9682a351f2afdc742ffd (patch)
tree367d1fbecd09f055157f86640ada6f9ecc37a1d7
parent480e31b477f877d254f95d2984c1a5b60e068450 (diff)
downloadethos-u-vela-3a1cfda8343e5a1e7c4a9682a351f2afdc742ffd.tar.gz
MLBEDSW-5514: Update PERFORMANCE.md
This commit corrects some errors and clarifies the section on cycle counts. Signed-off-by: Dwight Lidman <dwight.lidman@arm.com> Change-Id: If1198cb797ffdb2bd23b4a9624cf480a30aacaf6
-rw-r--r--PERFORMANCE.md21
1 files changed, 16 insertions, 5 deletions
diff --git a/PERFORMANCE.md b/PERFORMANCE.md
index 9cd05b2..13c6750 100644
--- a/PERFORMANCE.md
+++ b/PERFORMANCE.md
@@ -210,23 +210,34 @@ Total cycles 114098 cycles/batch
Batch Inference time 0.23 ms, 4382.18 inferences/s (batch size 1)
```
-### Neural network macs
+### Neural network MACs
This shows the estimated number of MACs in the network per batch. This number
-includes MACs from convolutions and vector products, not from operations such as
-elementwise and pooling operations.
+includes MACs from convolutions, vector products and pooling operations.
+It does not include MACs from elementwise or any other type of operation.
### Network Tops/s
This shows the estimated TOPs/s for the network, which is an alternative
-representation of [Neural network macs](#Neural-network-macs)
+representation of [Neural network MACs](#Neural-network-MACs)
### Cycles
This shows the estimated number of cycles per batch for NPU, memory accesses and
in total. The total is the sum of the single action that consumes the most
cycles per pass, i.e. if memory access consumes the most cycles for a pass only
-that will account for the pass cycles in the total.
+that will account for the pass cycles in the total.
+To clarify: for each type of cycle counts, the number of cycles per batch is the
+sum of cycle counts for each layer, where each layer's cycle count is based on
+the maximal processing path. A layer consists of a feature map and an operator.
+For example, if the DMA transfer for a feature map requires less cycles than the
+cycles for the operation, then the DMA cycles will not contribute to the layer
+cycle count. As a result, it will not be part of the summed SRAM or DRAM access
+cycles.
+Looking at the example above in [Estimated performance](#Estimated-performance),
+the zero cycle count for DRAM Access cycles means that either there was no DRAM
+access or, like in our previously described example, the DMA cycles were fewer
+than for the operation for every layer that required a DMA transfer.
### Batch Inference time