From ca7b03e95531bb3f9e14180d1c4fa24e3a514179 Mon Sep 17 00:00:00 2001 From: Ledion Daja Date: Fri, 20 May 2022 15:01:57 +0200 Subject: Improve documentation - added instruction about adding FVP executables into PATH environment variable - removed leading space from markdown table which caused faulty view in Gitiles - reworked description of run_platform.py script - replaced SVG with PNG images to support rendering in Gitiles Change-Id: I2f0b242138fff64b7ebc78f9ce6d76c3ef8a8e5b --- README.md | 79 +++++++++++++++++++++++++++++++++++++++--------------- docs/multinpu.png | Bin 0 -> 164970 bytes docs/startup.png | Bin 0 -> 139300 bytes 3 files changed, 58 insertions(+), 21 deletions(-) create mode 100644 docs/multinpu.png create mode 100644 docs/startup.png diff --git a/README.md b/README.md index 7b922a6..292d94c 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,8 @@ is a reference design of how to to build a secure System on Chip (SoC). A fixed virtual platform (FVP) of the Arm Corstone-300 including the Arm Ethos-U can be downloaded from the Ecosystem page at [developer.arm.com](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps). +Once installed, make sure to add the path of the FVP executables to your PATH +environment variable. ## Building @@ -75,22 +77,57 @@ CMD> cmake --build build ## run_platform.py -There are many things to consider when deploying a network to an embedded -system. Where should the data be placed, in SRAM, DRAM or flash? How is the -performance affected if a fast or slower memory is used? Which Ethos-U -performance counters should be enabled to measure the performance? +The purpose of the `scripts/run_platform.py` script is to deploy a network model +on an FVP target (Corstone-300 by default). It takes a TensorFlow Lite model +(.tflite file) as input, compiles it through Vela, optimizes the model, builds +the baremetal application for the target and finally runs an inference on the +FVP. The output is compared with reference data generated by Python TFLite. Note +that if not provided, the input data to the network is randomly generated. To +provide the script with a custom set of input and output data, use the +`--custom-input` and `--custom-output` arguments respectively (See command +below): -The main purpose of `scripts/run_platform.py` is to document how to go from -tflite to an application that can be run on a an embedded platform like -Corstone-300. It also allows users to adjust some settings like memory -configuration, timing adapter settings or which PMU events to monitor. Please -refer to the help message for further details about which arguments that can be -passed to the script. +``` +$ scripts/run_platform.py --custom-input --custom-output --network-path <.tflite network> +``` + +### Memory placement of data +Both model and arena can be placed in SRAM or DRAM. Note that the choice of a +memory placement will affect the performance, and that large network models +might not fit in SRAM. Use `--memory_model` and `--memory_arena` arguments +respecitvely for configuring the model and arena memory placement, as in the +example below: + +``` +$ scripts/run_platform.py --memory_model {sram,dram} --memory_arena {sram,dram} --network-path <.tflite network> +``` + +If not specified, by default the model is placed in DRAM and the arena in SRAM. + +### PMU event counters +The maximum number of performance counters depends on the hardware. For +Ethos-U55 and Ethos-U65 this is 4 PMU counters. A full list of the PMU events +can be found in +[ethosu55_interface.h](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu55_interface.h) +or +[ethosu65_interface.h](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu65_interface.h) +header files. To enable PMU counters the `--pmu` argument can be used as +exemplified below: ``` -$ scripts/run_platform.py --network-path +$ scripts/run_platform.py --pmu --pmu --pmu --pmu --network-path <.tflite network> ``` +### Timing adapters +The timing adapters are used to introduce latency to AXI transactions. This is +useful to emulate memory characteristics of different memory technologies, for +example when conducting performance measurements. For Corstone-300 there are two +timing adapters. They are placed at the NPU's AXI interfaces. The script also +offers configuration of the timing adapters by setting parameters such as number +of pending transactions, read and write latencies, number of cycles of let +through or blocked addresses, etc. Please refer to the help message for a more +extensive list of arguments that can be passed to the script. + ## Corstone-300 FVP Assuming that the Corstone-300 FVP has been downloaded, installed and placed in @@ -182,14 +219,14 @@ boot loader need can be found in section *MCC Memory mapping* of the documentation in the Corstone-300 FPGA archive. A part of the table is shown below. - | Cortex-M55 | MMC Bootloader | Name | - |-------------|----------------|-----------------| - | 0x0000_0000 | 0x0000_0000 | ITCM non secure | - | 0x1000_0000 | 0x0100_0000 | ITCM secure | - | 0x0100_0000 | 0x0200_0000 | SRAM non secure | - | 0x1100_0000 | 0x0300_0000 | SRAM secure | - | 0x6000_0000 | 0x0800_0000 | DDR non secure | - | 0x7000_0000 | 0x0c00_0000 | DDR secure | +| Cortex-M55 | MMC Bootloader | Name | +|-------------|----------------|-----------------| +| 0x0000_0000 | 0x0000_0000 | ITCM non secure | +| 0x1000_0000 | 0x0100_0000 | ITCM secure | +| 0x0100_0000 | 0x0200_0000 | SRAM non secure | +| 0x1100_0000 | 0x0300_0000 | SRAM secure | +| 0x6000_0000 | 0x0800_0000 | DDR non secure | +| 0x7000_0000 | 0x0c00_0000 | DDR secure | For example, the binary that the Cortex-M55 CPU expects at address 0x1000_0000 must therefor be written by the MCC to 0x0100_0000. @@ -314,7 +351,7 @@ Please note how the `ethosu_mutex_*` and `ethosu_semaphore_*` functions are implemented in the application layer. Mutexes are used for thread safety and semaphores for sleeping. -![Multi NPU](docs/multinpu.svg "Multi NPU sequence diagram") +![Multi NPU](docs/multinpu.png "Multi NPU sequence diagram") ## Multi NPU tradeoffs @@ -344,7 +381,7 @@ directly from CMSIS. The sequence diagram below describes what happens after the Cortex-M reset is lifted, up until the execution enters the application `main()`. -![Startup](docs/startup.svg "Startup sequence diagram") +![Startup](docs/startup.png "Startup sequence diagram") ## CMSIS Device diff --git a/docs/multinpu.png b/docs/multinpu.png new file mode 100644 index 0000000..6ac3f32 Binary files /dev/null and b/docs/multinpu.png differ diff --git a/docs/startup.png b/docs/startup.png new file mode 100644 index 0000000..0e730cf Binary files /dev/null and b/docs/startup.png differ -- cgit v1.2.1