MLECO-2345: Adding dynamic load support for FVPs

With this patch, the generic inference runner use-case can be configured to accept the model tflite file at run-time via the FVP's command line parameters. Same is true for the IFM and the inference results can be dumped out too. NOTE: this change is only for supporting the FVP, the FPGA implementation will not allow additional loading for the changes in this patch to be useful. Change-Id: I1318bd5b0cfb7bb635ced6fe58d22c3e401d2547
author: Kshitij Sisodia <kshitij.sisodia@arm.com> 2021-09-24 14:42:08 +0100
committer: Kshitij Sisodia <kshitij.sisodia@arm.com> 2021-09-24 13:43:22 +0000
commit: aa5e1f6c960b8a88f389ba70dd200d6dacd95a03 (patch)
tree: f05ad3ee9f6eff64a41464f32387d4150fe9363a /docs/use_cases
parent: 864317690dd670a18194e2a95c7c0da573613fa1 (diff)
download: ml-embedded-evaluation-kit-aa5e1f6c960b8a88f389ba70dd200d6dacd95a03.tar.gz
1 files changed, 69 insertions, 0 deletions
diff --git a/docs/use_cases/inference_runner.md b/docs/use_cases/inference_runner.md
index 7334886..2b2013c 100644
--- a/docs/use_cases/inference_runner.md
+++ b/docs/use_cases/inference_runner.md
@@ -11,6 +11,8 @@
     - [Setting up the Ethos-U NPU Fast Model](#setting-up-the-ethos_u-npu-fast-model)
     - [Starting Fast Model simulation](#starting-fast-model-simulation)
     - [Running Inference Runner](#running-inference-runner)
+    - [Building with dynamic model load capability](#building-with-dynamic-model-load-capability)
+    - [Running the FVP with dynamic model loading](#running-the-fvp-with-dynamic-model-loading)
 
 ## Introduction
 
@@ -55,6 +57,8 @@ following:
 - `inference_runner_ACTIVATION_BUF_SZ`: The intermediate, or activation, buffer size reserved for the NN model. By
   default, it is set to 2MiB and is enough for most models.
 
+- `inference_runner_DYNAMIC_MEM_LOAD_ENABLED`: This can be set to ON or OFF, to allow dynamic model load capability for use with MPS3 FVPs. See section [Building with dynamic model load capability](#building-with-dynamic-model-load-capability) below for more details.
+
 To build **ONLY** the Inference Runner example application, add `-DUSE_CASE_BUILD=inferece_runner` to the `cmake`
 command line, as specified in: [Building](../documentation.md#Building).
 
@@ -257,3 +261,68 @@ inference. For example:
 
 - For FPGA platforms, a CPU cycle count can also be enabled. However, do not use cycle counters for FVP, as the CPU
   model is not cycle-approximate or cycle-accurate.
+
+### Building with dynamic model load capability
+
+It is possible to build the inference runner application, targeting only the FVP environment, that allows
+loading of the TFLite model file at runtime. In this build configuration, the model TFLite file is not
+baked into the application but the application expects this model binary to be loaded at a specific address
+by an external agent. This loading capability also extends to the input data for the model.
+
+This feature depends on these addresses to be specified in target platform's CMake description and, by
+default, is available for use on the MPS3 FVP platform.
+
+> **NOTE**: The application built with this support will not work on the FPGA. This capability is only
+> provided for use with the FVP, to make it easier to try different ML workloads without having to build
+> the applications with different TFLite files baked into the application statically.
+> Also, this feature is not available for `native` target.
+
+The parameter `inference_runner_DYNAMIC_MEM_LOAD_ENABLED` should be set to ON in the CMake configuration
+command to enable this feature. For example, from a freshly created build directory, run:
+
+```commandline
+cmake .. \
+  -Dinference_runner_DYNAMIC_MEM_LOAD_ENABLED=ON \
+  -DUSE_CASE_BUILD=inference_runner
+```
+
+Once the configuration completes, running:
+```commandline
+make -j
+```
+will build the application that will expect the neural network model and the IFM to be loaded into
+specific addresses. These addresses are defined in
+[corstone-sse-300.cmake](../../scripts/cmake/subsystem-profiles/corstone-sse-300.cmake) for the MPS3
+target.
+
+### Running the FVP with dynamic model loading
+
+If the application has been built with dynamic loading capability, as described in the previous section,
+the FVP can be invoked with command line parameters that will load specific data into memory. For example,
+the command below loads a custom model at address `0x90000000`, a custom input from address `0x92000000`
+and when the FVP exits, it dumps a file named `output.bin` with the output tensors consolidated into a
+binary blob.
+
+> **NOTE** The CMake profile for the target should also give an indication of the maximum sizes for
+> each of the regions. This is also mentioned in the linker scripts for the same target. For MPS3,
+> the model size can be a maximum of 32MiB. The IFM and OFM spaces are both reserved as 16MiB sections.
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a \
+  ./bin/ethos-u-inference_runner.axf \
+  --data /path/to/custom-model.tflite@0x90000000 \
+  --data /path/to/custom-ifm.bin@0x92000000 \
+  --dump cpu0=/path/to/output.bin@Memory:0x93000000,1024
+```
+The above command will dump a 1KiB (1024 bytes) file with output tensors as a binary blob after it
+has consumed the model and IFM data provided by the file paths specified and the inference is
+executed successfully.
+If the size of the output tensors is unknown before running the FVP, it can be run without the `--dump`
+parameter to check the size of the output first by looking at the application log. Alternatively, a
+size of 16MiB will dump the whole reserved section for the OFM to a file.
+
+> **NOTE**: When there are multiple input tensors, the application is set up to iterate over all of
+> them and populate each of them, in sequence, with the required amount of data. The sequence in which
+> these tensors are populated is governed by the index assigned to them within the TensorFlow Lite Micro
+> framework. So, the input binary blob should be a consolidated file containing data for all the input
+> tensors. The same packing is used for output binary dumps.
author	Kshitij Sisodia <kshitij.sisodia@arm.com>	2021-09-24 14:42:08 +0100
committer	Kshitij Sisodia <kshitij.sisodia@arm.com>	2021-09-24 13:43:22 +0000
commit	aa5e1f6c960b8a88f389ba70dd200d6dacd95a03 (patch)
tree	f05ad3ee9f6eff64a41464f32387d4150fe9363a /docs/use_cases
parent	864317690dd670a18194e2a95c7c0da573613fa1 (diff)
download	ml-embedded-evaluation-kit-aa5e1f6c960b8a88f389ba70dd200d6dacd95a03.tar.gz