summaryrefslogtreecommitdiff
path: root/docs/use_cases
diff options
context:
space:
mode:
authorKshitij Sisodia <kshitij.sisodia@arm.com>2021-09-24 14:42:08 +0100
committerKshitij Sisodia <kshitij.sisodia@arm.com>2021-09-24 13:43:22 +0000
commitaa5e1f6c960b8a88f389ba70dd200d6dacd95a03 (patch)
treef05ad3ee9f6eff64a41464f32387d4150fe9363a /docs/use_cases
parent864317690dd670a18194e2a95c7c0da573613fa1 (diff)
downloadml-embedded-evaluation-kit-aa5e1f6c960b8a88f389ba70dd200d6dacd95a03.tar.gz
MLECO-2345: Adding dynamic load support for FVPs
With this patch, the generic inference runner use-case can be configured to accept the model tflite file at run-time via the FVP's command line parameters. Same is true for the IFM and the inference results can be dumped out too. NOTE: this change is only for supporting the FVP, the FPGA implementation will not allow additional loading for the changes in this patch to be useful. Change-Id: I1318bd5b0cfb7bb635ced6fe58d22c3e401d2547
Diffstat (limited to 'docs/use_cases')
-rw-r--r--docs/use_cases/inference_runner.md69
1 files changed, 69 insertions, 0 deletions
diff --git a/docs/use_cases/inference_runner.md b/docs/use_cases/inference_runner.md
index 7334886..2b2013c 100644
--- a/docs/use_cases/inference_runner.md
+++ b/docs/use_cases/inference_runner.md
@@ -11,6 +11,8 @@
- [Setting up the Ethos-U NPU Fast Model](#setting-up-the-ethos_u-npu-fast-model)
- [Starting Fast Model simulation](#starting-fast-model-simulation)
- [Running Inference Runner](#running-inference-runner)
+ - [Building with dynamic model load capability](#building-with-dynamic-model-load-capability)
+ - [Running the FVP with dynamic model loading](#running-the-fvp-with-dynamic-model-loading)
## Introduction
@@ -55,6 +57,8 @@ following:
- `inference_runner_ACTIVATION_BUF_SZ`: The intermediate, or activation, buffer size reserved for the NN model. By
default, it is set to 2MiB and is enough for most models.
+- `inference_runner_DYNAMIC_MEM_LOAD_ENABLED`: This can be set to ON or OFF, to allow dynamic model load capability for use with MPS3 FVPs. See section [Building with dynamic model load capability](#building-with-dynamic-model-load-capability) below for more details.
+
To build **ONLY** the Inference Runner example application, add `-DUSE_CASE_BUILD=inferece_runner` to the `cmake`
command line, as specified in: [Building](../documentation.md#Building).
@@ -257,3 +261,68 @@ inference. For example:
- For FPGA platforms, a CPU cycle count can also be enabled. However, do not use cycle counters for FVP, as the CPU
model is not cycle-approximate or cycle-accurate.
+
+### Building with dynamic model load capability
+
+It is possible to build the inference runner application, targeting only the FVP environment, that allows
+loading of the TFLite model file at runtime. In this build configuration, the model TFLite file is not
+baked into the application but the application expects this model binary to be loaded at a specific address
+by an external agent. This loading capability also extends to the input data for the model.
+
+This feature depends on these addresses to be specified in target platform's CMake description and, by
+default, is available for use on the MPS3 FVP platform.
+
+> **NOTE**: The application built with this support will not work on the FPGA. This capability is only
+> provided for use with the FVP, to make it easier to try different ML workloads without having to build
+> the applications with different TFLite files baked into the application statically.
+> Also, this feature is not available for `native` target.
+
+The parameter `inference_runner_DYNAMIC_MEM_LOAD_ENABLED` should be set to ON in the CMake configuration
+command to enable this feature. For example, from a freshly created build directory, run:
+
+```commandline
+cmake .. \
+ -Dinference_runner_DYNAMIC_MEM_LOAD_ENABLED=ON \
+ -DUSE_CASE_BUILD=inference_runner
+```
+
+Once the configuration completes, running:
+```commandline
+make -j
+```
+will build the application that will expect the neural network model and the IFM to be loaded into
+specific addresses. These addresses are defined in
+[corstone-sse-300.cmake](../../scripts/cmake/subsystem-profiles/corstone-sse-300.cmake) for the MPS3
+target.
+
+### Running the FVP with dynamic model loading
+
+If the application has been built with dynamic loading capability, as described in the previous section,
+the FVP can be invoked with command line parameters that will load specific data into memory. For example,
+the command below loads a custom model at address `0x90000000`, a custom input from address `0x92000000`
+and when the FVP exits, it dumps a file named `output.bin` with the output tensors consolidated into a
+binary blob.
+
+> **NOTE** The CMake profile for the target should also give an indication of the maximum sizes for
+> each of the regions. This is also mentioned in the linker scripts for the same target. For MPS3,
+> the model size can be a maximum of 32MiB. The IFM and OFM spaces are both reserved as 16MiB sections.
+
+```commandline
+~/FVP_install_location/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -a \
+ ./bin/ethos-u-inference_runner.axf \
+ --data /path/to/custom-model.tflite@0x90000000 \
+ --data /path/to/custom-ifm.bin@0x92000000 \
+ --dump cpu0=/path/to/output.bin@Memory:0x93000000,1024
+```
+The above command will dump a 1KiB (1024 bytes) file with output tensors as a binary blob after it
+has consumed the model and IFM data provided by the file paths specified and the inference is
+executed successfully.
+If the size of the output tensors is unknown before running the FVP, it can be run without the `--dump`
+parameter to check the size of the output first by looking at the application log. Alternatively, a
+size of 16MiB will dump the whole reserved section for the OFM to a file.
+
+> **NOTE**: When there are multiple input tensors, the application is set up to iterate over all of
+> them and populate each of them, in sequence, with the required amount of data. The sequence in which
+> these tensors are populated is governed by the index assigned to them within the TensorFlow Lite Micro
+> framework. So, the input binary blob should be a consolidated file containing data for all the input
+> tensors. The same packing is used for output binary dumps.