MLECO-2395: Allow users to select Ethos-U memory mode

Change-Id: Icf09410f12072e8d7850dd1e540c3243af24ed09
author: Isabella Gottardi <isabella.gottardi@arm.com> 2021-09-16 17:54:35 +0100
committer: Isabella Gottardi <isabella.gottardi@arm.com> 2021-10-05 14:00:47 +0000
commit: 118f73e0396fe66ee5cc3c0daec0882c7160a7cb (patch)
tree: fa604ebef4a221844c294b76598c259a12feb61d /docs
parent: 5c0ce54aaf276a13ac30902e8181faa662289b33 (diff)
download: ml-embedded-evaluation-kit-118f73e0396fe66ee5cc3c0daec0882c7160a7cb.tar.gz
6 files changed, 179 insertions, 108 deletions
diff --git a/docs/documentation.md b/docs/documentation.md
index 28b9eda..a186fbb 100644
--- a/docs/documentation.md
+++ b/docs/documentation.md
@@ -306,7 +306,16 @@ Please refer to: [Testing and benchmarking](./sections/testing_benchmarking.md#t
 
 ## Memory Considerations
 
-Please refer to: [Memory considerations](./sections/memory_considerations.md#memory-considerations)
+Please refer to:
+
+- [Memory considerations](./sections/memory_considerations.md#memory-considerations)
+  - [Understanding memory usage from Vela output](./sections/memory_considerations.md#understanding-memory-usage-from-vela-output)
+    - [Total SRAM used](./sections/memory_considerations.md#total-sram-used)
+    - [Total Off-chip Flash used](./sections/memory_considerations.md#total-off_chip-flash-used)
+  - [Memory mode configurations](./sections/memory_considerations.md#memory-mode-configurations)
+  - [Tensor arena and neural network model memory placement](./sections/memory_considerations.md#tensor-arena-and-neural-network-model-memory-placement)
+  - [Memory usage for ML use-cases](./sections/memory_considerations.md#memory-usage-for-ml-use_cases)
+  - [Memory constraints](./sections/memory_considerations.md#memory-constraints)
 
 ## Troubleshooting
 
diff --git a/docs/quick_start.md b/docs/quick_start.md
index ce0b436..3488447 100644
--- a/docs/quick_start.md
+++ b/docs/quick_start.md
@@ -113,6 +113,14 @@ curl -L https://github.com/ARM-software/ML-zoo/raw/68b5fbc77ed28e67b2efc915997ea
     --output-dir=resources_downloaded/kws
 mv resources_downloaded/kws/ds_cnn_clustered_int8_vela.tflite resources_downloaded/kws/ds_cnn_clustered_int8_vela_H128.tflite
 
+. resources_downloaded/env/bin/activate && vela resources_downloaded/kws/ds_cnn_clustered_int8.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/kws
+mv resources_downloaded/kws/ds_cnn_clustered_int8_vela.tflite resources_downloaded/kws/ds_cnn_clustered_int8_vela_Y256.tflite
+
 . resources_downloaded/env/bin/activate && vela resources_downloaded/kws_asr/wav2letter_int8.tflite \
     --accelerator-config=ethos-u55-128 \
     --optimise Performance --config scripts/vela/default_vela.ini \
@@ -121,7 +129,15 @@ mv resources_downloaded/kws/ds_cnn_clustered_int8_vela.tflite resources_download
     --output-dir=resources_downloaded/kws_asr
 mv resources_downloaded/kws_asr/wav2letter_int8_vela.tflite resources_downloaded/kws_asr/wav2letter_int8_vela_H128.tflite
 
-. resources_downloaded/env/bin/activate && vela resources_downloaded/kws_asr/ds_cnn_clustered_int8.tflite -\
+. resources_downloaded/env/bin/activate && vela resources_downloaded/kws_asr/wav2letter_int8.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/kws_asr
+mv resources_downloaded/kws_asr/wav2letter_int8_vela.tflite resources_downloaded/kws_asr/wav2letter_int8_vela_Y256.tflite
+
+. resources_downloaded/env/bin/activate && vela resources_downloaded/kws_asr/ds_cnn_clustered_int8.tflite \
     --accelerator-config=ethos-u55-128 \
     --optimise Performance --config scripts/vela/default_vela.ini \
     --memory-mode=Shared_Sram \
@@ -129,7 +145,15 @@ mv resources_downloaded/kws_asr/wav2letter_int8_vela.tflite resources_downloaded
     --output-dir=resources_downloaded/kws_asr
 mv resources_downloaded/kws_asr/ds_cnn_clustered_int8_vela.tflite resources_downloaded/kws_asr/ds_cnn_clustered_int8_vela_H128.tflite
 
-. resources_downloaded/env/bin/activate && vela resources_downloaded/inference_runner/dnn_s_quantized.tflite -\
+. resources_downloaded/env/bin/activate && vela resources_downloaded/kws_asr/ds_cnn_clustered_int8.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/kws_asr
+mv resources_downloaded/kws_asr/ds_cnn_clustered_int8_vela.tflite resources_downloaded/kws_asr/ds_cnn_clustered_int8_vela_Y256.tflite
+
+. resources_downloaded/env/bin/activate && vela resources_downloaded/inference_runner/dnn_s_quantized.tflite \
     --accelerator-config=ethos-u55-128 \
     --optimise Performance --config scripts/vela/default_vela.ini \
     --memory-mode=Shared_Sram \
@@ -137,6 +161,14 @@ mv resources_downloaded/kws_asr/ds_cnn_clustered_int8_vela.tflite resources_down
     --output-dir=resources_downloaded/inference_runner
 mv resources_downloaded/inference_runner/dnn_s_quantized_vela.tflite resources_downloaded/inference_runner/dnn_s_quantized_vela_H128.tflite
 
+. resources_downloaded/env/bin/activate && vela resources_downloaded/inference_runner/dnn_s_quantized.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/inference_runner
+mv resources_downloaded/inference_runner/dnn_s_quantized_vela.tflite resources_downloaded/inference_runner/dnn_s_quantized_vela_Y256.tflite
+
 . resources_downloaded/env/bin/activate && vela resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8.tflite \
     --accelerator-config=ethos-u55-128 \
     --optimise Performance --config scripts/vela/default_vela.ini \
@@ -145,6 +177,14 @@ mv resources_downloaded/inference_runner/dnn_s_quantized_vela.tflite resources_d
     --output-dir=resources_downloaded/img_class
 mv resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8_vela.tflite resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8_vela_H128.tflite
 
+. resources_downloaded/env/bin/activate && vela resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/img_class
+mv resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8_vela.tflite resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8_vela_Y256.tflite
+
 . resources_downloaded/env/bin/activate && vela resources_downloaded/asr/wav2letter_int8.tflite \
     --accelerator-config=ethos-u55-128 \
     --optimise Performance --config scripts/vela/default_vela.ini \
@@ -153,6 +193,14 @@ mv resources_downloaded/img_class/mobilenet_v2_1.0_224_INT8_vela.tflite resource
     --output-dir=resources_downloaded/asr
 mv resources_downloaded/asr/wav2letter_int8_vela.tflite resources_downloaded/asr/wav2letter_int8_vela_H128.tflite
 
+. resources_downloaded/env/bin/activate && vela resources_downloaded/asr/wav2letter_int8.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/asr
+mv resources_downloaded/asr/wav2letter_int8_vela.tflite resources_downloaded/asr/wav2letter_int8_vela_Y256.tflite
+
 . resources_downloaded/env/bin/activate && vela resources_downloaded/ad/ad_medium_int8.tflite \
     --accelerator-config=ethos-u55-128 \
     --optimise Performance --config scripts/vela/default_vela.ini \
@@ -161,6 +209,14 @@ mv resources_downloaded/asr/wav2letter_int8_vela.tflite resources_downloaded/asr
     --output-dir=resources_downloaded/ad
 mv resources_downloaded/ad/ad_medium_int8_vela.tflite resources_downloaded/ad/ad_medium_int8_vela_H128.tflite
 
+. resources_downloaded/env/bin/activate && vela resources_downloaded/ad/ad_medium_int8.tflite \
+    --accelerator-config=ethos-u65-256 \
+    --optimise Performance --config scripts/vela/default_vela.ini \
+    --memory-mode=Dedicated_Sram \
+    --system-config=Ethos_U65_High_End \
+    --output-dir=resources_downloaded/ad
+mv resources_downloaded/ad/ad_medium_int8_vela.tflite resources_downloaded/ad/ad_medium_int8_vela_Y256.tflite
+
 mkdir cmake-build-mps3-sse-300-gnu-release and cd cmake-build-mps3-sse-300-gnu-release
 
 cmake .. \
@@ -171,4 +227,4 @@ cmake .. \
 
 > **Note:** If you want to change the application, then, instead of using the `build_default` Python script, follow the
 > approach defined in [documentation.md](./documentation.md#arm_ml-embedded-evaluation-kit). For example, if you wanted to modify the number of
-> MAC units of the Ethos-U, or running a custom neural network.
+> MACs units of the Ethos-U, or running a custom neural network.
diff --git a/docs/sections/building.md b/docs/sections/building.md
index 192c4aa..3adaa72 100644
--- a/docs/sections/building.md
+++ b/docs/sections/building.md
@@ -139,7 +139,7 @@ The build parameters are:
   [bare-metal-gcc.cmake](../../scripts/cmake/toolchains/bare-metal-gcc.cmake).
 
 - `TENSORFLOW_SRC_PATH`: the path to the root of the TensorFlow directory. The default value points to the
-  `dependencies/tensorflow` git submodule. Respository is hosted here: [tensorflow](https://github.com/tensorflow/tensorflow)
+  `dependencies/tensorflow` git submodule. Repository is hosted here: [tensorflow](https://github.com/tensorflow/tensorflow)
 
 - `ETHOS_U_NPU_DRIVER_SRC_PATH`: The path to the *Ethos-U* NPU core driver sources. The default value points to the
   `dependencies/core-driver` git submodule. Repository is hosted here:
@@ -147,11 +147,23 @@ The build parameters are:
 
 - `CMSIS_SRC_PATH`: The path to the CMSIS sources to be used to build TensorFlow Lite Micro library. This parameter is
   optional and is only valid for Arm® *Cortex®-M* CPU targeted configurations. The default value points to the
-  `dependencies/cmsis` git submodule. Respository is hosted here: [CMSIS-5](https://github.com/ARM-software/CMSIS_5.git)
+  `dependencies/cmsis` git submodule. Repository is hosted here: [CMSIS-5](https://github.com/ARM-software/CMSIS_5.git)
 
 - `ETHOS_U_NPU_ENABLED`: Sets whether the use of *Ethos-U* NPU is available for the deployment target. By default, this
   is set and therefore application is built with *Ethos-U* NPU supported.
 
+- `ETHOS_U_NPU_ID`: The *Ethos-U* NPU processor:
+  - `U55` (default)
+  - `U65`
+
+- `ETHOS_U_NPU_MEMORY_MODE`:  The *Ethos-U* NPU memory mode:
+  - `Shared_Sram` (default for *Ethos-U55* NPU)
+  - `Dedicated_Sram` (default for *Ethos-U65* NPU)
+  - `Sram_Only`
+
+  >**Note:** The `Shared_Sram` memory mode is available on both *Ethos-U55* and *Ethos-U65* NPU, `Dedicated_Sram` only
+  > for *Ethos-U65* NPU and `Sram_Only` only for Ethos-U55* NPU.
+
 - `CPU_PROFILE_ENABLED`: Sets whether profiling information for the CPU core should be displayed. By default, this is
   set to false, but can be turned on for FPGA targets. The the FVP and the CPU core cycle counts are not meaningful and
   are not to be used.
@@ -178,7 +190,9 @@ The build parameters are:
   `timing_adapter` dependencies folder.
 
 - `TA_CONFIG_FILE`: The path to the CMake configuration file that contains the timing adapter parameters. Used only if
-  the timing adapter build is enabled.
+  the timing adapter build is enabled. Default for Ethos-U55 NPU is
+  [ta_config_u55_high_end.cmake](../../scripts/timing_adapter/ta_config_u55_high_end.cmake),
+  for Ethos-U65 NPU is [ta_config_u55_high_end.cmake](../../scripts/timing_adapter/ta_config_u55_high_end.cmake).
 
 - `TENSORFLOW_LITE_MICRO_CLEAN_BUILD`: Optional parameter to enable, or disable, "cleaning" prior to building for the
   TensorFlow Lite Micro library. Enabled by default.
@@ -189,12 +203,12 @@ The build parameters are:
 - `ARMCLANG_DEBUG_DWARF_LEVEL`: When the CMake build type is specified as `Debug` and when the `armclang` toolchain is
   being used to build for a *Cortex-M* CPU target, this optional argument can be set to specify the `DWARF` format.
 
-    By default, this is set to 4 and is synonymous with passing `-g` flag to the compiler. This is compatible with Arm
-    DS and other tools which can interpret the latest DWARF format. To allow debugging using the Model Debugger from Arm
-    Fast Model Tools Suite, this argument can be used to pass DWARF format version as "3".
+  By default, this is set to 4 and is synonymous with passing `-g` flag to the compiler. This is compatible with Arm
+  DS and other tools which can interpret the latest DWARF format. To allow debugging using the Model Debugger from Arm
+  Fast Model Tools Suite, this argument can be used to pass DWARF format version as "3".
 
-    >**Note:** This option is only available when the CMake project is configured with the `-DCMAKE_BUILD_TYPE=Debug`
-    >argument. Also, the same dwarf format is used for building TensorFlow Lite Micro library.
+  >**Note:** This option is only available when the CMake project is configured with the `-DCMAKE_BUILD_TYPE=Debug`
+  >argument. Also, the same dwarf format is used for building TensorFlow Lite Micro library.
 
 For details on the specific use-case build options, follow the instructions in the use-case specific documentation.
 
@@ -265,7 +279,7 @@ python3 ./set_up_default_resources.py
 ```
 
 This fetches every model into the `resources_downloaded` directory. It also optimizes the models using the Vela compiler
-for the default 128 MAC configuration of the Arm® *Ethos™-U55* NPU.
+for the default 128 MACs configuration of the Arm® *Ethos™-U55* NPU and for the default 256 MACs configuration of the Arm® *Ethos™-U65* NPU.
 
 > **Note:** This script requires Python version 3.6 or higher. Please make sure all [build prerequisites](#build-prerequisites)
 > are satisfied.
@@ -507,7 +521,7 @@ The CMake build framework allows the parameters to control the behavior of each
   > **Note:** The bandwidth cap `BWCAP` operates on the transaction level and, because of its simple implementation, the accuracy is limited.
   > When set to a small value it allows only a small number of transactions for each pulse cycle.
   > Once the counter has reached or exceeded the configured cap, no transactions will be allowed before the next pulse cycle.
-  > In order to minimise this effect some possible solutions are:
+  > In order to minimize this effect some possible solutions are:
   >
   >- scale up all the parameters to a reasonably large value.
   >- scale up `BWCAP` as a multiple of the burst length (in this case bulk traffic will not face rounding errors in the bandwidth cap).
@@ -688,7 +702,7 @@ The Vela command contains the following:
 - `--accelerator-config`: Specifies the accelerator configuration to use between `ethos-u55-256`, `ethos-u55-128`,
   `ethos-u55-64`, `ethos-u55-32`, `ethos-u65-256`, and `ethos-u65-512`.
 - `--optimise`: Sets the optimisation strategy to Performance or Size. The Size strategy results in a model minimising the SRAM
-  usage whereas the Performance strategy optimises the neural network for maximal perforamance.
+  usage whereas the Performance strategy optimises the neural network for maximal performance.
   Note that if using the Performance strategy, you can also pass the `--arena-cache-size` option to Vela.
 - `--config`: Specifies the path to the Vela configuration file. The format of the file is a Python ConfigParser `.ini`
     file. An example can be found in the `dependencies` folder [default_vela.ini](../../scripts/vela/default_vela.ini).
@@ -714,17 +728,18 @@ using the *Ethos-U55* High End timing adapter system configuration.
 To build for a different *Ethos-U* NPU variant:
 
 - Optimize the model with Vela compiler with the correct parameters. See [Optimize custom model with Vela compiler](./building.md#optimize-custom-model-with-vela-compiler).
+- Use the correct `ETHOS_U_NPU_ID`: `U55` for *Ethos-U55* NPU, `U65` for *Ethos-U65* NPU.
 - Use the Vela model as custom model in the building command. See [Add custom model](./building.md#add-custom-model)
 - Use the correct timing adapter settings configuration. See [Building timing adapter with custom options](./building.md#building-timing-adapter-with-custom-options)
 
-For example, when building for *Ethos-U65* High End system configuration, the Vela comand will be:
+For example, when building for *Ethos-U65* High End system configuration and 512 MACs/cc, the Vela command will be:
 
 ```commandline
 vela \
     <model_file>.tflite \
-    --accelerator-config ethos-u65-256 \
+    --accelerator-config ethos-u65-512 \
     --optimise Performance \
-    --memory-mode=Shared_Sram \
+    --memory-mode=Dedicated_Sram \
     --system-config=Ethos_U65_High_End \
     --config=../scripts/vela/default_vela.ini
 ```
@@ -733,8 +748,8 @@ And the cmake command:
 
 ```commandline
 cmake .. \
-    -D<use_case>_MODEL_TFLITE_PATH=<path/to/ethos_u65_vela_model.tflite> \
-    -DTA_CONFIG_FILE=scripts/cmake/ta_config_u65_high_end.cmake
+    -DETHOS_U_NPU_ID=U65 \
+    -D<use_case>_MODEL_TFLITE_PATH=<path/to/ethos_u65_vela_model.tflite>
 ```
 
 ## Automatic file generation
diff --git a/docs/sections/customizing.md b/docs/sections/customizing.md
index 3104986..854a3ed 100644
--- a/docs/sections/customizing.md
+++ b/docs/sections/customizing.md
@@ -671,8 +671,8 @@ For the hello world use-case, it is enough to create a `helloworld.cmake` file a
 so:
 
 ```cmake
-if (ETHOS_U_NPU_ENABLED EQUAL 1)
-  set(DEFAULT_MODEL_PATH  ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8_vela.tflite)
+if (ETHOS_U_NPU_ENABLED)
+  set(DEFAULT_MODEL_PATH  ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8_vela_${DEFAULT_NPU_CONFIG_ID}.tflite)
 else()
   set(DEFAULT_MODEL_PATH  ${DEFAULT_MODEL_DIR}/helloworldmodel_uint8.tflite)
 endif()
diff --git a/docs/sections/memory_considerations.md b/docs/sections/memory_considerations.md
index fc81f8f..89baf41 100644
--- a/docs/sections/memory_considerations.md
+++ b/docs/sections/memory_considerations.md
@@ -7,7 +7,7 @@
   - [Understanding memory usage from Vela output](#understanding-memory-usage-from-vela-output)
     - [Total SRAM used](#total-sram-used)
     - [Total Off-chip Flash used](#total-off_chip-flash-used)
-  - [Non-default configurations](#non-default-configurations)
+  - [Memory mode configurations](#memory-mode-configurations)
   - [Tensor arena and neural network model memory placement](#tensor-arena-and-neural-network-model-memory-placement)
   - [Memory usage for ML use-cases](#memory-usage-for-ml-use_cases)
   - [Memory constraints](#memory-constraints)
@@ -94,52 +94,88 @@ buffers is. These are:
 
 ### Total SRAM used
 
-When the neural network model is compiled with Vela, a summary report that includes memory usage is generated. For
-example, compiling the keyword spotting model
+When the neural network model is compiled with Vela, a summary report that includes memory usage is generated.
+For example, compiling the keyword spotting model
 [ds_cnn_clustered_int8](https://github.com/ARM-software/ML-zoo/blob/master/models/keyword_spotting/ds_cnn_large/tflite_clustered_int8/ds_cnn_clustered_int8.tflite)
-with Vela produces, among others, the following output:
+with the Vela command:
+
+```commandline
+vela \
+  --accelerator-config=ethos-u55-128 \
+  --optimise Performance \
+  --config scripts/vela/default_vela.ini
+  --memory-mode=Shared_Sram
+  --system-config=Ethos_U55_High_End_Embedded
+  ds_cnn_clustered_int8.tflite
+```
+
+It produces, among others, the following output:
 
 ```log
-Total SRAM used                                 70.77 KiB
-Total Off-chip Flash used                      430.78 KiB
+Total SRAM used                                146.31 KiB
+Total Off-chip Flash used                      452.42 KiB
 ```
 
 The `Total SRAM used` here shows the required memory to store the `tensor arena` for the TensorFlow Lite Micro
 framework. This is the amount of memory required to store the input, output, and intermediate buffers. In the preceding
-example, the tensor arena requires 70.77 KiB of available SRAM.
+example, the tensor arena requires 146.31 KiB of available SRAM.
 
 > **Note:** Vela can only estimate the SRAM required for graph execution. It has no way of estimating the memory used by
 > internal structures from TensorFlow Lite Micro framework.
 
-Therefore, we recommend that you top this memory size by at least 2KiB. We also recoomend that you also carve out the
+Therefore, we recommend that you top this memory size by at least 2KiB. We also recommend that you also carve out the
 `tensor arena` of this size, and then place it on the SRAM of the target system.
 
 ### Total Off-chip Flash used
 
 The `Total Off-chip Flash` parameter indicates the minimum amount of flash required to store the neural network model.
-In the preceding example, the system must have a minimum of 430.78 KiB of available flash memory to store the `.tflite`
+In the preceding example, the system must have a minimum of 452.42 KiB of available flash memory to store the `.tflite`
 file contents.
 
 > **Note:** The Arm® *Corstone™-300* system uses the DDR region as a flash memory. The timing adapter sets up the AXI
 > bus that is wired to the DDR to mimic both bandwidth and latency characteristics of a flash memory device.
 
-## Non-default configurations
+## Memory mode configurations
+
+The preceding example outlines a typical configuration for *Ethos-U55* NPU, and this corresponds to the default
+Vela memory mode setting.
+Evaluation kit supports all the *Ethos-U* NPU memory modes:
+
+|  *Ethos™-U* NPU  |   Default Memory Mode  |  Other Memory Modes supported  |
+|------------------|------------------------|--------------------------------|
+|   *Ethos™-U55*   |     `Shared_Sram`      |          `Sram_Only`           |
+|   *Ethos™-U65*   |    `Dedicated_Sram`    |         `Shared_Sram`          |
 
-The preceding example outlines a typical configuration, and this corresponds to the default Vela setting. However, the
-system SRAM can also be used to store the neural network model along with the `tensor arena`. Vela supports optimizing
-the model for this configuration with its `Sram_Only` memory mode.
+For further information on the default settings, please refer to: [default_vela.ini](../../scripts/vela/default_vela.ini).
 
-For further information, please refer to: [vela.ini](../../scripts/vela/vela.ini).
+For *Ethos-U55* NPU, the system SRAM can also be used to store the neural network model along with the `tensor arena`.
+Vela supports optimizing the model for this configuration with its `Sram_Only` memory mode.
+Although the Vela settings for this configurations suggests that only AXI0 bus is used, when compiling the model
+a warning is generated, for example:
+
+```log
+vela \
+  --accelerator-config=ethos-u55-128 \
+  --optimise Performance \
+  --config scripts/vela/default_vela.ini
+  --memory-mode=Sram_Only
+  --system-config=Ethos_U55_High_End_Embedded
+  ds_cnn_clustered_int8.tflite
+
+Info: Changing const_mem_area from Sram to OnChipFlash. This will use the same characteristics as Sram.
+```
 
-To make use of a neural network model that is optimized for this configuration, the linker script for the target
-platform must be changed. By default, the linker scripts are set up to support the default configuration only.
+This means that the  neural network model is always placed in the flash region. In this case, timing adapters for the
+AXI buses are set the same values to mimic both bandwidth and latency characteristics of a SRAM memory device.
+See [Ethos-U55 NPU timing adapter default configuration](../../scripts/cmake/timing_adapter/ta_config_u55_high_end.cmake).
 
 For script snippets, please refer to: [Memory constraints](./memory_considerations.md#memory-constraints).
 
 > **Note:**
 >
-> 1. The the `Shared_Sram` memory mode represents the default configuration.
-> 2. The `Dedicated_Sram` mode is only applicable for the Arm® *Ethos™-U65*.
+> 1. The `Shared_Sram` memory mode represents the default configuration.
+> 2. The `Dedicated_Sram` memory mode is only applicable for the Arm® *Ethos™-U65*.
+> 3. The `Sram_only` memory mode is only applicable for the Arm® *Ethos™-U55*.
 
 ## Tensor arena and neural network model memory placement
 
@@ -147,18 +183,15 @@ The evaluation kit uses the name `activation buffer` for the `tensor arena` in t
 Every use-case application has a corresponding `<use_case_name>_ACTIVATION_BUF_SZ` parameter that governs the maximum
 available size of the `activation buffer` for that particular use-case.
 
-The linker script is set up to place this memory region in SRAM. However, if the memory required is more than what the
-target platform supports, this buffer needs to be placed on flash instead. Every target platform has a profile
-definition in the form of a `CMake` file.
+The linker script is set up to place this memory region in SRAM for *Ethos-U55* and in flash for *Ethos-U65*.
+Every target platform has a profile definition in the form of a `CMake` file.
 
 For further information and an example, please refer to: [Corstone-300 profile](../../scripts/cmake/subsystem-profiles/corstone-sse-300.cmake).
 
 The parameter `ACTIVATION_BUF_SRAM_SZ` defines the maximum SRAM size available for the platform. This is propagated
-through the build system. If the `<use_case_name>_ACTIVATION_BUF_SZ` for a given use-case is *more* than the
-`ACTIVATION_BUF_SRAM_SZ` for the target build platform, then the `activation buffer` is placed on the flash memory
-instead.
+through the build system.
 
-The neural network model is always placed in the flash region. However, this can be changed in the linker script.
+The neural network model is always placed in the flash region (even in case of `Sram_Only` memory mode as mentioned earlier).
 
 ## Memory usage for ML use-cases
 
@@ -168,12 +201,12 @@ memory requirements for the different use-cases of the evaluation kit.
 > **Note:** The SRAM usage does not include memory used by TensorFlow Lite Micro and must be topped up as explained
 > under [Total SRAM used](#total-sram-used).
 
-- [Keyword spotting model](https://github.com/ARM-software/ML-zoo/tree/master/models/keyword_spotting/ds_cnn_large/tflite_clustered_int8)
+- [Keyword spotting model](https://github.com/ARM-software/ML-zoo/tree/68b5fbc77ed28e67b2efc915997ea4477c1d9d5b//models/keyword_spotting/ds_cnn_large/tflite_clustered_int8)
   requires
   - 70.7 KiB of SRAM
   - 430.7 KiB of flash memory.
 
-- [Image classification model](https://github.com/ARM-software/ML-zoo/tree/master/models/image_classification/mobilenet_v2_1.0_224/tflite_uint8)
+- [Image classification model](https://github.com/ARM-software/ML-zoo/tree/e0aa361b03c738047b9147d1a50e3f2dcb13dbcb/models/image_classification/mobilenet_v2_1.0_224/tflite_uint8)
   requires
   - 638.6 KiB of SRAM
   - 3.1 MB of flash memory.
@@ -199,38 +232,8 @@ scatter file is as follows:
 ;---------------------------------------------------------
 LOAD_REGION_0       0x00000000                  0x00080000
 {
-    ;-----------------------------------------------------
-    ; First part of code mem - 512kiB
-    ;-----------------------------------------------------
-    itcm.bin        0x00000000                  0x00080000
-    {
-        *.o (RESET, +First)
-        * (InRoot$$Sections)
-
-        ; Essentially only RO-CODE, RO-DATA is in a
-        ; different region.
-        .ANY (+RO)
-    }
-
-    ;-----------------------------------------------------
-    ; 128kiB of 512kiB DTCM is used for any other RW or ZI
-    ; data. Note: this region is internal to the Cortex-M
-    ; CPU.
-    ;-----------------------------------------------------
-    dtcm.bin        0x20000000                  0x00020000
-    {
-        ; Any R/W and/or zero initialised data
-        .ANY(+RW +ZI)
-    }
 
-    ;-----------------------------------------------------
-    ; 384kiB of stack space within the DTCM region. See
-    ; `dtcm.bin` for the first section. Note: by virtue of
-    ; being part of DTCM, this region is only accessible
-    ; from Cortex-M55.
-    ;-----------------------------------------------------
-    ARM_LIB_STACK   0x20020000 EMPTY ALIGN 8    0x00060000
-    {}
+...
 
     ;-----------------------------------------------------
     ; SSE-300's internal SRAM of 4MiB - reserved for
@@ -240,8 +243,11 @@ LOAD_REGION_0       0x00000000                  0x00080000
     ;-----------------------------------------------------
     isram.bin       0x31000000  UNINIT ALIGN 16 0x00400000
     {
-        ; activation buffers a.k.a tensor arena
-        *.o (.bss.NoInit.activation_buf)
+        ; Cache area (if used)
+        *.o (.bss.NoInit.ethos_u_cache)
+
+        ; activation buffers a.k.a tensor arena when memory mode sram only
+        *.o (.bss.NoInit.activation_buf_sram)
     }
 }
 
@@ -251,7 +257,7 @@ LOAD_REGION_0       0x00000000                  0x00080000
 LOAD_REGION_1       0x70000000                  0x02000000
 {
     ;-----------------------------------------------------
-    ; 32 MiB of DRAM space for neural network model,
+    ; 32 MiB of DDR space for neural network model,
     ; input vectors and labels. If the activation buffer
     ; size required by the network is bigger than the
     ; SRAM size available, it is accommodated here.
@@ -261,33 +267,18 @@ LOAD_REGION_1       0x70000000                  0x02000000
         ; nn model's baked in input matrices
         *.o (ifm)
 
-        ; nn model
+        ; nn model's default space
         *.o (nn_model)
 
         ; labels
         *.o (labels)
 
-        ; if the activation buffer (tensor arena) doesn't
-        ; fit in the SRAM region, we accommodate it here
-        *.o (activation_buf)
+        ; activation buffers a.k.a tensor arena when memory mode dedicated sram
+        *.o (activation_buf_dram)
     }
 
-    ;-----------------------------------------------------
-    ; First 256kiB of BRAM (FPGA SRAM) used for RO data.
-    ; Note: Total BRAM size available is 2MiB.
-    ;-----------------------------------------------------
-    bram.bin        0x11000000          ALIGN 8 0x00040000
-    {
-        ; RO data (incl. unwinding tables for debugging)
-        .ANY (+RO-DATA)
-    }
+...
 
-    ;-----------------------------------------------------
-    ; Remaining part of the 2MiB BRAM used as heap space.
-    ; 0x00200000 - 0x00040000 = 0x001C0000 (1.75 MiB)
-    ;-----------------------------------------------------
-    ARM_LIB_HEAP    0x11040000 EMPTY ALIGN 8    0x001C0000
-    {}
 }
 
 ```
diff --git a/docs/sections/troubleshooting.md b/docs/sections/troubleshooting.md
index b2bd421..fc81ffd 100644
--- a/docs/sections/troubleshooting.md
+++ b/docs/sections/troubleshooting.md
@@ -36,20 +36,20 @@ ERROR - Invoke failed.
 ERROR - Inference failed.
 ```
 
-It shows that the configuration of the Vela compiled `.tflite` file doesn't match the number of MAC units on the FVP.
+It shows that the configuration of the Vela compiled `.tflite` file doesn't match the number of MACs units on the FVP.
 
 The Vela configuration parameter `accelerator-config` used for producing the .`tflite` file that is used
-while building the application should match the MAC configuration that the FVP is emulating.
+while building the application should match the MACs configuration that the FVP is emulating.
 For example, if the `accelerator-config` from the Vela command was `ethos-u55-128`, the FVP should be emulating the
-128 MAC configuration of the Ethos-U55 block(default FVP configuration). If the `accelerator-config` used was
+128 MACs configuration of the Ethos-U55 block(default FVP configuration). If the `accelerator-config` used was
 `ethos-u55-256`, the FVP must be executed with additional command line parameter to instruct it to emulate the
-256 MAC configuration instead.
+256 MACs configuration instead.
 
 The [deploying on an FVP emulating MPS3](./deployment.md#deploying-on-an-fvp-emulating-mps3) page provides guidance
-on how to instruct the FVP to change the number of MAC units.
+on how to instruct the FVP to change the number of MACs units.
 
 Note that when the FVP is launched and the application starts executing, various parameters about the system are
-logged over UART. These include the MAC/cc configuration of the FVP.
+logged over UART. These include the MACs/cc configuration of the FVP.
 
 ```log
 INFO - MPS3 core clock has been set to: 32000000Hz
author	Isabella Gottardi <isabella.gottardi@arm.com>	2021-09-16 17:54:35 +0100
committer	Isabella Gottardi <isabella.gottardi@arm.com>	2021-10-05 14:00:47 +0000
commit	118f73e0396fe66ee5cc3c0daec0882c7160a7cb (patch)
tree	fa604ebef4a221844c294b76598c259a12feb61d /docs
parent	5c0ce54aaf276a13ac30902e8181faa662289b33 (diff)
download	ml-embedded-evaluation-kit-118f73e0396fe66ee5cc3c0daec0882c7160a7cb.tar.gz