From 8df12f37531d57a10cba2f8b2e8b6a9065202dd5 Mon Sep 17 00:00:00 2001 From: Isabella Gottardi Date: Wed, 7 Apr 2021 17:15:31 +0100 Subject: MLECO-1870: Cherry pick profiling changes from dev to open source repo * Documentation update Change-Id: If85e7ebc44498840b291c408f14e66a5a5faa424 Signed-off-by: Isabella Gottardi --- docs/sections/customizing.md | 19 ++-- docs/sections/deployment.md | 14 +-- docs/sections/testing_benchmarking.md | 4 +- docs/use_cases/ad.md | 162 ++++++++++++------------------ docs/use_cases/asr.md | 112 +++++++++++---------- docs/use_cases/img_class.md | 96 ++++++++++-------- docs/use_cases/inference_runner.md | 25 ++++- docs/use_cases/kws.md | 92 +++++++++-------- docs/use_cases/kws_asr.md | 184 ++++++++++++++++++---------------- 9 files changed, 363 insertions(+), 345 deletions(-) (limited to 'docs') diff --git a/docs/sections/customizing.md b/docs/sections/customizing.md index e92c327..346a34c 100644 --- a/docs/sections/customizing.md +++ b/docs/sections/customizing.md @@ -243,12 +243,14 @@ Profiler is a helper class assisting in collection of timings and Ethos-U55 cycle counts for operations. It uses platform timer to get system timing information. -| Method name | Description | -|----------------------|-----------------------------------------------------------| -| StartProfiling | Starts profiling and records the starting timing data. | -| StopProfiling | Stops profiling and records the ending timing data. | -| Reset | Resets the profiler and clears all collected data. | -| GetResultsAndReset | Gets the results as string and resets the profiler. | +| Method name | Description | +|-------------------------|----------------------------------------------------------------| +| StartProfiling | Starts profiling and records the starting timing data. | +| StopProfiling | Stops profiling and records the ending timing data. | +| StopProfilingAndReset | Stops the profiling and internally resets the platform timers. | +| Reset | Resets the profiler and clears all collected data. | +| GetAllResultsAndReset | Gets the results as string and resets the profiler. | +| SetName | Set the profiler name. | Usage example: @@ -259,7 +261,7 @@ profiler.StartProfiling(); // Code running inference to profile profiler.StopProfiling(); -info("%s\n", profiler.GetResultsAndReset().c_str()); +profiler.PrintProfilingResult(); ``` ## NN Model API @@ -571,9 +573,8 @@ Profiler profiler{&platform, "Inference"}; profiler.StartProfiling(); model.RunInference(); profiler.StopProfiling(); -std::string profileResults = profiler.GetResultsAndReset(); -info("%s\n", profileResults.c_str()); +profiler.PrintProfilingResult(); ``` ## Printing to console diff --git a/docs/sections/deployment.md b/docs/sections/deployment.md index 354d30b..3d5796f 100644 --- a/docs/sections/deployment.md +++ b/docs/sections/deployment.md @@ -267,13 +267,13 @@ off. 7. On the second serial port, output similar to section 2.2 should be visible: ```log - [INFO] Setting up system tick IRQ (for NPU) - [INFO] V2M-MPS3 revision C - [INFO] Application Note AN540, Revision B - [INFO] FPGA build 1 - [INFO] Core clock has been set to: 32000000 Hz - [INFO] CPU ID: 0x410fd220 - [INFO] CPU: Cortex-M55 r0p0 + INFO - Setting up system tick IRQ (for NPU) + INFO - V2M-MPS3 revision C + INFO - Application Note AN540, Revision B + INFO - FPGA build 1 + INFO - Core clock has been set to: 32000000 Hz + INFO - CPU ID: 0x410fd220 + INFO - CPU: Cortex-M55 r0p0 ... ``` diff --git a/docs/sections/testing_benchmarking.md b/docs/sections/testing_benchmarking.md index 43bb7f4..0c7c675 100644 --- a/docs/sections/testing_benchmarking.md +++ b/docs/sections/testing_benchmarking.md @@ -45,8 +45,8 @@ dev_ethosu_eval--tests ``` ```log -[INFO] native platform initialised -[INFO] ARM Ethos-U55 Evaluation application for MPS3 FPGA Prototyping Board and FastModel +INFO - native platform initialised +INFO - ARM Ethos-U55 Evaluation application for MPS3 FPGA Prototyping Board and FastModel ... =============================================================================== diff --git a/docs/use_cases/ad.md b/docs/use_cases/ad.md index ca95af8..1ff9c4f 100644 --- a/docs/use_cases/ad.md +++ b/docs/use_cases/ad.md @@ -390,41 +390,39 @@ Choice: 4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes: ```log - [INFO] uTFL version: 2.5.0 - [INFO] Model info: - [INFO] Model INPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 1024 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 32 - [INFO] 2: 32 - [INFO] 3: 1 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.192437 - [INFO] ZeroPoint[0] = 11 - [INFO] Model OUTPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 8 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 8 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.048891 - [INFO] ZeroPoint[0] = -30 - [INFO] Activation buffer (a.k.a tensor arena) size used: 198016 - [INFO] Number of operators: 1 - [INFO] Operator 0: ethos-u - [INFO] Use of Arm uNPU is enabled - + INFO - uTFL version: 2.5.0 + INFO - Model info: + INFO - Model INPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 1024 bytes with dimensions + INFO - 0: 1 + INFO - 1: 32 + INFO - 2: 32 + INFO - 3: 1 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.192437 + INFO - ZeroPoint[0] = 11 + INFO - Model OUTPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 8 bytes with dimensions + INFO - 0: 1 + INFO - 1: 8 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.048891 + INFO - ZeroPoint[0] = -30 + INFO - Activation buffer (a.k.a tensor arena) size used: 198016 + INFO - Number of operators: 1 + INFO - Operator 0: ethos-u ``` 5. “List” menu option prints a list of pair ... indexes - the original filenames embedded in the application: ```log - [INFO] List of Files: - [INFO] 0 =>; anomaly_id_00_00000000.wav - [INFO] 1 =>; anomaly_id_02_00000076.wav - [INFO] 2 =>; normal_id_00_00000004.wav - [INFO] 3 =>; normal_id_02_00000001.wav + INFO - List of Files: + INFO - 0 =>; anomaly_id_00_00000000.wav + INFO - 1 =>; anomaly_id_02_00000076.wav + INFO - 2 =>; normal_id_00_00000004.wav + INFO - 3 =>; normal_id_02_00000001.wav ``` ### Running Anomaly Detection @@ -434,76 +432,30 @@ Please select the first menu option to execute Anomaly Detection. The following example illustrates application output: ```log -[INFO] Running inference on audio clip 0 => anomaly_id_00_00000000.wav -[INFO] Inference 1/13 -[INFO] Profile for Inference: - Active NPU cycles: 1081154 - Idle NPU cycles: 1012 - -[INFO] Inference 2/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080934 - Idle NPU cycles: 232 - -[INFO] Inference 3/13 -[INFO] Profile for Inference: - Active NPU cycles: 1081332 - Idle NPU cycles: 834 - -[INFO] Inference 4/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080748 - Idle NPU cycles: 418 - -[INFO] Inference 5/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080728 - Idle NPU cycles: 438 - -[INFO] Inference 6/13 -[INFO] Profile for Inference: - Active NPU cycles: 1081144 - Idle NPU cycles: 1022 - -[INFO] Inference 7/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080924 - Idle NPU cycles: 242 - -[INFO] Inference 8/13 -[INFO] Profile for Inference: - Active NPU cycles: 1081322 - Idle NPU cycles: 844 - -[INFO] Inference 9/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080738 - Idle NPU cycles: 428 - -[INFO] Inference 10/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080718 - Idle NPU cycles: 448 - -[INFO] Inference 11/13 -[INFO] Profile for Inference: - Active NPU cycles: 1081134 - Idle NPU cycles: 1032 - -[INFO] Inference 12/13 -[INFO] Profile for Inference: - Active NPU cycles: 1080914 - Idle NPU cycles: 252 - -[INFO] Inference 13/13 -[INFO] Profile for Inference: - Active NPU cycles: 1081312 - Idle NPU cycles: 854 - -[INFO] Average anomaly score is: -0.024493 +INFO - Running inference on audio clip 0 => anomaly_id_00_00000000.wav +INFO - Inference 1/13 +INFO - Inference 2/13 +INFO - Inference 3/13 +INFO - Inference 4/13 +INFO - Inference 5/13 +INFO - Inference 6/13 +INFO - Inference 7/13 +INFO - Inference 8/13 +INFO - Inference 9/13 +INFO - Inference 10/13 +INFO - Inference 11/13 +INFO - Inference 12/13 +INFO - Inference 13/13 +INFO - Average anomaly score is: -0.024493 Anomaly threshold is: -0.800000 Anomaly detected! - +INFO - Profile for Inference: +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 628122 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 135087 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 62870 +INFO - NPU ACTIVE cycles: 1081007 +INFO - NPU IDLE cycles: 626 +INFO - NPU total cycles: 1081634 ``` As multiple inferences have to be run for one clip it will take around a minute or so for all inferences to complete. @@ -515,9 +467,19 @@ The profiling section of the log shows that for each inference. For the last inf - Ethos-U55's PMU report: - - 1,081,312 active cycles: number of cycles that were used for computation + - 1,081,634 total cycle: The number of NPU cycles + + - 1,081,007 active cycles: number of NPU cycles that were used for computation + + - 626 idle cycles: number of cycles for which the NPU was idle + + - 628,122 AXI0 read cycles: The number of cycles the NPU spends to execute AXI0 read transactions. + AXI0 is the bus where Ethos-U55 NPU reads and writes to the computation buffers (activation buf/tensor arenas). + + - 135,087 AXI0 write cycles: The number of cycles the NPU spends to execute AXI0 write transactions. - - 854 idle cycles: number of cycles for which the NPU was idle + - 62,870 AXI1 read cycles: The number of cycles the NPU spends to execute AXI1 read transactions. + AXI1 is the bus where Ethos-U55 NPU reads the model (read only) - For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as the CPU model is not cycle-approximate or cycle-accurate. diff --git a/docs/use_cases/asr.md b/docs/use_cases/asr.md index d224aca..4600698 100644 --- a/docs/use_cases/asr.md +++ b/docs/use_cases/asr.md @@ -431,7 +431,7 @@ Choice: compiled audio. > **Note:** Note that if the clip is over a certain length, the application will invoke multiple inference runs to ->cover the entire file. + >cover the entire file. 2. “Classify audio clip at chosen index” menu option will run inference on the chosen audio clip. @@ -444,41 +444,40 @@ Choice: 4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes: ```log - [INFO] uTFL version: 2.5.0 - [INFO] Model info: - [INFO] Model INPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 11544 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 296 - [INFO] 2: 39 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.110316 - [INFO] ZeroPoint[0] = -11 - [INFO] Model OUTPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 4292 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 1 - [INFO] 2: 148 - [INFO] 3: 29 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.003906 - [INFO] ZeroPoint[0] = -128 - [INFO] Activation buffer (a.k.a tensor arena) size used: 783168 - [INFO] Number of operators: 1 - [INFO] Operator 0: ethos-u - [INFO] Use of Arm uNPU is enabled + INFO - uTFL version: 2.5.0 + INFO - Model info: + INFO - Model INPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 11544 bytes with dimensions + INFO - 0: 1 + INFO - 1: 296 + INFO - 2: 39 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.110316 + INFO - ZeroPoint[0] = -11 + INFO - Model OUTPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 4292 bytes with dimensions + INFO - 0: 1 + INFO - 1: 1 + INFO - 2: 148 + INFO - 3: 29 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.003906 + INFO - ZeroPoint[0] = -128 + INFO - Activation buffer (a.k.a tensor arena) size used: 783168 + INFO - Number of operators: 1 + INFO - Operator 0: ethos-u ``` 5. “List” menu option prints a list of pair audio clip indexes - the original filenames embedded in the application: ```log - [INFO] List of Files: - [INFO] 0 => anotherdoor.wav - [INFO] 1 => anotherengineer.wav - [INFO] 2 => itellyou.wav - [INFO] 3 => testingroutine.wav + INFO - List of Files: + INFO - 0 => anotherdoor.wav + INFO - 1 => anotherengineer.wav + INFO - 2 => itellyou.wav + INFO - 3 => testingroutine.wav ``` ### Running Automatic Speech Recognition @@ -488,28 +487,21 @@ Please select the first menu option to execute Automatic Speech Recognition. The following example illustrates application output: ```log -[INFO] Running inference on audio clip 0 => anotherdoor.wav -[INFO] Inference 1/2 -[INFO] Profile for pre-processing: - Active NPU cycles: 0 - Idle NPU cycles: 6 - -[INFO] Profile for Inference: - Active NPU cycles: 28924342 - Idle NPU cycles: 824 - -[INFO] Inference 2/2 -[INFO] Profile for pre-processing: - Active NPU cycles: 0 - Idle NPU cycles: 6 - -[INFO] Profile for Inference: - Active NPU cycles: 28924298 - Idle NPU cycles: 868 - -[INFO] Result for inf 0: and he walked immediately out o t -[INFO] Result for inf 1: he aparctment by anoer dor -[INFO] Final result: and he walked immediately out o the aparctment by anoer dor +INFO - Running inference on audio clip 0 => another_door.wav +INFO - Inference 1/2 +INFO - Inference 2/2 +INFO - Final results: +INFO - Total number of inferences: 2 +INFO - For timestamp: 0.000000 (inference #: 0); label: and he walked immediately out of th +INFO - For timestamp: 0.000000 (inference #: 1); label: e apartment by another door +INFO - Complete recognition: and he walked immediately out of the apartment by another door +INFO - Profile for Inference : +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 6564262 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 928889 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 841712 +INFO - NPU ACTIVE cycles: 28450696 +INFO - NPU IDLE cycles: 476 +INFO - NPU total cycles: 28451172 ``` It could take several minutes to complete each inference (average time is 5-7 minutes), and on this audio clip multiple @@ -519,9 +511,19 @@ The profiling section of the log shows that for the first inference: - Ethos-U55's PMU report: - - 28,924,298 active cycles: number of NPU cycles that were used for computation + - 28,451,172 total cycle: The number of NPU cycles - - 868 idle cycles: number of cycles for which the NPU was idle + - 28,450,696 active cycles: number of NPU cycles that were used for computation + + - 476 idle cycles: number of cycles for which the NPU was idle + + - 6,564,262 AXI0 read cycles: The number of cycles the NPU spends to execute AXI0 read transactions. + AXI0 is the bus where Ethos-U55 NPU reads and writes to the computation buffers (activation buf/tensor arenas). + + - 928,889 AXI0 write cycles: The number of cycles the NPU spends to execute AXI0 write transactions. + + - 841,712 AXI1 read cycles: The number of cycles the NPU spends to execute AXI1 read transactions. + AXI1 is the bus where Ethos-U55 NPU reads the model (read only) - For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as the CPU model is not cycle-approximate or cycle-accurate. diff --git a/docs/use_cases/img_class.md b/docs/use_cases/img_class.md index 7a409f2..b26b746 100644 --- a/docs/use_cases/img_class.md +++ b/docs/use_cases/img_class.md @@ -371,40 +371,39 @@ Choice: 4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes: ```log - [INFO] uTFL version: 2.5.0 - [INFO] Model info: - [INFO] Model INPUT tensors: - [INFO] tensor type is UINT8 - [INFO] tensor occupies 150528 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 224 - [INFO] 2: 224 - [INFO] 3: 3 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.007812 - [INFO] ZeroPoint[0] = 128 - [INFO] Model OUTPUT tensors: - [INFO] tensor type is UINT8 - [INFO] tensor occupies 1001 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 1001 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.098893 - [INFO] ZeroPoint[0] = 58 - [INFO] Activation buffer (a.k.a tensor arena) size used: 521760 - [INFO] Number of operators: 1 - [INFO] Operator 0: ethos-u - [INFO] Use of Arm uNPU is enabled + INFO - uTFL version: 2.5.0 + INFO - Model info: + INFO - Model INPUT tensors: + INFO - tensor type is UINT8 + INFO - tensor occupies 150528 bytes with dimensions + INFO - 0: 1 + INFO - 1: 224 + INFO - 2: 224 + INFO - 3: 3 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.007812 + INFO - ZeroPoint[0] = 128 + INFO - Model OUTPUT tensors: + INFO - tensor type is UINT8 + INFO - tensor occupies 1001 bytes with dimensions + INFO - 0: 1 + INFO - 1: 1001 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.098893 + INFO - ZeroPoint[0] = 58 + INFO - Activation buffer (a.k.a tensor arena) size used: 521760 + INFO - Number of operators: 1 + INFO - Operator 0: ethos-u ``` 5. “List Images” menu option prints a list of pair image indexes - the original filenames embedded in the application: ```log - [INFO] List of Files: - [INFO] 0 => cat.bmp - [INFO] 1 => dog.bmp - [INFO] 2 => kimono.bmp - [INFO] 3 => tiger.bmp + INFO - List of Files: + INFO - 0 => cat.bmp + INFO - 1 => dog.bmp + INFO - 2 => kimono.bmp + INFO - 3 => tiger.bmp ``` ### Running Image Classification @@ -414,16 +413,21 @@ Please select the first menu option to execute Image Classification. The following example illustrates application output for classification: ```log -[INFO] Running inference on image 0 => cat.bmp -[INFO] Profile for Inference: - Active NPU cycles: 7622641 - Idle NPU cycles: 525 - -[INFO] 0) 282 (14.636096) -> tabby, tabby cat -[INFO] 1) 286 (14.537203) -> Egyptian cat -[INFO] 2) 283 (12.757138) -> tiger cat -[INFO] 3) 458 (7.021370) -> bow tie, bow-tie, bowtie -[INFO] 4) 288 (7.021370) -> lynx, catamount +INFO - Running inference on image 0 => cat.bmp +INFO - Final results: +INFO - Total number of inferences: 1 +INFO - 0) 282 (14.636096) -> tabby, tabby cat +INFO - 1) 286 (14.537203) -> Egyptian cat +INFO - 2) 283 (12.757138) -> tiger cat +INFO - 3) 458 (7.021370) -> bow tie, bow-tie, bowtie +INFO - 4) 288 (7.021370) -> lynx, catamount +INFO - Profile for Inference: +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 2489726 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 1098726 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 471129 +INFO - NPU ACTIVE cycles: 7489258 +INFO - NPU IDLE cycles: 914 +INFO - NPU total cycles: 7490172 ``` It could take several minutes to complete one inference run (average time is 2-3 minutes). @@ -435,9 +439,19 @@ The profiling section of the log shows that for this inference: - Ethos-U55's PMU report: - - 7,622,641 active cycles: number of NPU cycles that were used for computation + - 7,490,172 total cycle: The number of NPU cycles - - 525 idle cycles: number of cycles for which the NPU was idle + - 7,489,258 active cycles: number of NPU cycles that were used for computation + + - 914 idle cycles: number of cycles for which the NPU was idle + + - 2,489,726 AXI0 read cycles: The number of cycles the NPU spends to execute AXI0 read transactions. + AXI0 is the bus where Ethos-U55 NPU reads and writes to the computation buffers (activation buf/tensor arenas). + + - 1,098,726 AXI0 write cycles: The number of cycles the NPU spends to execute AXI0 write transactions. + + - 471,129 AXI1 read cycles: The number of cycles the NPU spends to execute AXI1 read transactions. + AXI1 is the bus where Ethos-U55 NPU reads the model (read only) - For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as the CPU model is not cycle-approximate or cycle-accurate. diff --git a/docs/use_cases/inference_runner.md b/docs/use_cases/inference_runner.md index ffb205e..350c1e8 100644 --- a/docs/use_cases/inference_runner.md +++ b/docs/use_cases/inference_runner.md @@ -278,9 +278,14 @@ After the application has started the inference starts immediately and it output The following example illustrates application output: ```log -[INFO] Profile for Inference: - Active NPU cycles: 26976 - Idle NPU cycles: 196 +INFO - Final results: +INFO - Profile for Inference : +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 9332 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 3248 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 2219 +INFO - NPU ACTIVE cycles: 33145 +INFO - NPU IDLE cycles: 1033 +INFO - NPU total cycles: 34178 ``` After running an inference on randomly generated data, the output of the log shows the profiling results that for this @@ -288,9 +293,19 @@ inference: - Ethos-U55's PMU report: - - 26,976 active cycles: number of cycles that were used for computation + - 34,178 total cycle: The number of NPU cycles - - 196 idle cycles: number of cycles for which the NPU was idle + - 33,145 active cycles: number of NPU cycles that were used for computation + + - 1,033 idle cycles: number of cycles for which the NPU was idle + + - 2,219 AXI0 read cycles: The number of cycles the NPU spends to execute AXI0 read transactions. + AXI0 is the bus where Ethos-U55 NPU reads and writes to the computation buffers (activation buf/tensor arenas). + + - 3,248 AXI0 write cycles: The number of cycles the NPU spends to execute AXI0 write transactions. + + - 9,332 AXI1 read cycles: The number of cycles the NPU spends to execute AXI1 read transactions. + AXI1 is the bus where Ethos-U55 NPU reads the model (read only) - For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as the CPU model is not cycle-approximate or cycle-accurate. diff --git a/docs/use_cases/kws.md b/docs/use_cases/kws.md index 316b501..4942744 100644 --- a/docs/use_cases/kws.md +++ b/docs/use_cases/kws.md @@ -405,41 +405,40 @@ Choice: 4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes: ```log - [INFO] uTFL version: 2.5.0 - [INFO] Model info: - [INFO] Model INPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 490 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 1 - [INFO] 2: 49 - [INFO] 3: 10 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 1.107164 - [INFO] ZeroPoint[0] = 95 - [INFO] Model OUTPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 12 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 12 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.003906 - [INFO] ZeroPoint[0] = -128 - [INFO] Activation buffer (a.k.a tensor arena) size used: 72848 - [INFO] Number of operators: 1 - [INFO] Operator 0: ethos-u - [INFO] Use of Arm uNPU is enabled + INFO - uTFL version: 2.5.0 + INFO - Model info: + INFO - Model INPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 490 bytes with dimensions + INFO - 0: 1 + INFO - 1: 1 + INFO - 2: 49 + INFO - 3: 10 + INFO - Quant dimension: 0 + INFO - Scale[0] = 1.107164 + INFO - ZeroPoint[0] = 95 + INFO - Model OUTPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 12 bytes with dimensions + INFO - 0: 1 + INFO - 1: 12 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.003906 + INFO - ZeroPoint[0] = -128 + INFO - Activation buffer (a.k.a tensor arena) size used: 72848 + INFO - Number of operators: 1 + INFO - Operator 0: ethos-u ``` 5. “List audio clips” menu option prints a list of pair audio indexes - the original filenames embedded in the application: ```log - [INFO] List of Files: - [INFO] 0 => down.wav - [INFO] 1 => rightleftup.wav - [INFO] 2 => yes.wav - [INFO] 3 => yesnogostop.wav + INFO - List of Files: + INFO - 0 => down.wav + INFO - 1 => rightleftup.wav + INFO - 2 => yes.wav + INFO - 3 => yesnogostop.wav ``` ### Running Keyword Spotting @@ -448,15 +447,18 @@ Selecting the first option will run inference on the first file. The following example illustrates application output for classification: -```log -[INFO] Running inference on audio clip 0 => down.wav -[INFO] Inference 1/1 -[INFO] Profile for Inference: - Active NPU cycles: 680400 - Idle NPU cycles: 766 - -[INFO] For timestamp: 0.000000 (inference #: 0); threshold: 0.900000 -[INFO] label @ 0: down, score: 0.996094 +```logINFO - Running inference on audio clip 0 => down.wav +INFO - Inference 1/1 +INFO - Final results: +INFO - Total number of inferences: 1 +INFO - For timestamp: 0.000000 (inference #: 0); label: down, score: 0.996094; threshold: 0.900000 +INFO - Profile for Inference: +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 217385 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 82607 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 59608 +INFO - NPU ACTIVE cycles: 680611 +INFO - NPU IDLE cycles: 561 +INFO - NPU total cycles: 681172 ``` Each inference should take less than 30 seconds on most systems running Fast Model. @@ -464,9 +466,19 @@ The profiling section of the log shows that for this inference: - Ethos-U55's PMU report: - - 680,400 active cycles: number of cycles that were used for computation + - 681,172 total cycle: The number of NPU cycles + + - 680,611 active cycles: The number of NPU cycles that were used for computation + + - 561 idle cycles: number of cycles for which the NPU was idle + + - 217,385 AXI0 read cycles: The number of cycles the NPU spends to execute AXI0 read transactions. + AXI0 is the bus where Ethos-U55 NPU reads and writes to the computation buffers (activation buf/tensor arenas). + + - 82,607 write cycles: The number of cycles the NPU spends to execute AXI0 write transactions. - - 766 idle cycles: number of cycles for which the NPU was idle + - 59,608 AXI1 read cycles: The number of cycles the NPU spends to execute AXI1 read transactions. + AXI1 is the bus where Ethos-U55 NPU reads the model (read only) - For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as the CPU model is not cycle-approximate or cycle-accurate. diff --git a/docs/use_cases/kws_asr.md b/docs/use_cases/kws_asr.md index e79b887..132a82d 100644 --- a/docs/use_cases/kws_asr.md +++ b/docs/use_cases/kws_asr.md @@ -468,72 +468,72 @@ Choice: 4. “Show NN model info” menu option prints information about model data type, input and output tensor sizes: ```log - [INFO] uTFL version: 2.5.0 - [INFO] Model INPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 490 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 1 - [INFO] 2: 49 - [INFO] 3: 10 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 1.107164 - [INFO] ZeroPoint[0] = 95 - [INFO] Model OUTPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 12 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 12 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.003906 - [INFO] ZeroPoint[0] = -128 - [INFO] Activation buffer (a.k.a tensor arena) size used: 123616 - [INFO] Number of operators: 16 - [INFO] Operator 0: RESHAPE - [INFO] Operator 1: CONV_2D - [INFO] Operator 2: DEPTHWISE_CONV_2D - [INFO] Operator 3: CONV_2D - [INFO] Operator 4: DEPTHWISE_CONV_2D - [INFO] Operator 5: CONV_2D - [INFO] Operator 6: DEPTHWISE_CONV_2D - [INFO] Operator 7: CONV_2D - [INFO] Operator 8: DEPTHWISE_CONV_2D - [INFO] Operator 9: CONV_2D - [INFO] Operator 10: DEPTHWISE_CONV_2D - [INFO] Operator 11: CONV_2D - [INFO] Operator 12: AVERAGE_POOL_2D - [INFO] Operator 13: RESHAPE - [INFO] Operator 14: FULLY_CONNECTED - [INFO] Operator 15: SOFTMAX - [INFO] Model INPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 11544 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 296 - [INFO] 2: 39 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.110316 - [INFO] ZeroPoint[0] = -11 - [INFO] Model OUTPUT tensors: - [INFO] tensor type is INT8 - [INFO] tensor occupies 4292 bytes with dimensions - [INFO] 0: 1 - [INFO] 1: 1 - [INFO] 2: 148 - [INFO] 3: 29 - [INFO] Quant dimension: 0 - [INFO] Scale[0] = 0.003906 - [INFO] ZeroPoint[0] = -128 - [INFO] Activation buffer (a.k.a tensor arena) size used: 809808 - [INFO] Number of operators: 1 - [INFO] Operator 0: ethos-u + INFO - uTFL version: 2.5.0 + INFO - Model INPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 490 bytes with dimensions + INFO - 0: 1 + INFO - 1: 1 + INFO - 2: 49 + INFO - 3: 10 + INFO - Quant dimension: 0 + INFO - Scale[0] = 1.107164 + INFO - ZeroPoint[0] = 95 + INFO - Model OUTPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 12 bytes with dimensions + INFO - 0: 1 + INFO - 1: 12 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.003906 + INFO - ZeroPoint[0] = -128 + INFO - Activation buffer (a.k.a tensor arena) size used: 123616 + INFO - Number of operators: 16 + INFO - Operator 0: RESHAPE + INFO - Operator 1: CONV_2D + INFO - Operator 2: DEPTHWISE_CONV_2D + INFO - Operator 3: CONV_2D + INFO - Operator 4: DEPTHWISE_CONV_2D + INFO - Operator 5: CONV_2D + INFO - Operator 6: DEPTHWISE_CONV_2D + INFO - Operator 7: CONV_2D + INFO - Operator 8: DEPTHWISE_CONV_2D + INFO - Operator 9: CONV_2D + INFO - Operator 10: DEPTHWISE_CONV_2D + INFO - Operator 11: CONV_2D + INFO - Operator 12: AVERAGE_POOL_2D + INFO - Operator 13: RESHAPE + INFO - Operator 14: FULLY_CONNECTED + INFO - Operator 15: SOFTMAX + INFO - Model INPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 11544 bytes with dimensions + INFO - 0: 1 + INFO - 1: 296 + INFO - 2: 39 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.110316 + INFO - ZeroPoint[0] = -11 + INFO - Model OUTPUT tensors: + INFO - tensor type is INT8 + INFO - tensor occupies 4292 bytes with dimensions + INFO - 0: 1 + INFO - 1: 1 + INFO - 2: 148 + INFO - 3: 29 + INFO - Quant dimension: 0 + INFO - Scale[0] = 0.003906 + INFO - ZeroPoint[0] = -128 + INFO - Activation buffer (a.k.a tensor arena) size used: 809808 + INFO - Number of operators: 1 + INFO - Operator 0: ethos-u ``` 5. “List” menu option prints a list of pair ... indexes - the original filenames embedded in the application: ```log - [INFO] List of Files: - [INFO] 0 => yesnogostop.wav + INFO - List of Files: + INFO - 0 => yesnogostop.wav ``` ### Running Keyword Spotting and Automatic Speech Recognition @@ -543,29 +543,31 @@ Please select the first menu option to execute Keyword Spotting and Automatic Sp The following example illustrates application output: ```log -[INFO] KWS audio data window size 16000 -[INFO] Running KWS inference on audio clip 0 => yesnogostop.wav -[INFO] Inference 1/7 -[INFO] Profile for Inference: - Active NPU cycles: 0 - Idle NPU cycles: 6 - -[INFO] For timestamp: 0.000000 (inference #: 0); threshold: 0.900000 -[INFO] label @ 0: yes, score: 0.996094 -[INFO] Keyword spotted -[INFO] Inference 1/2 -[INFO] Profile for Inference: - Active NPU cycles: 28924742 - Idle NPU cycles: 424 - -[INFO] Inference 2/2 -[INFO] Profile for Inference: - Active NPU cycles: 28924740 - Idle NPU cycles: 426 - -[INFO] Result for inf 0: no gow -[INFO] Result for inf 1: stoppe -[INFO] Final result: no gow stoppe +INFO - KWS audio data window size 16000 +INFO - Running KWS inference on audio clip 0 => yesnogostop.wav +INFO - Inference 1/7 +INFO - For timestamp: 0.000000 (inference #: 0); threshold: 0.900000 +INFO - label @ 0: yes, score: 0.996094 +INFO - Profile for Inference: +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 217385 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 82607 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 59608 +INFO - NPU ACTIVE cycles: 680611 +INFO - NPU IDLE cycles: 561 +INFO - NPU total cycles: 681172 +INFO - Keyword spotted +INFO - Inference 1/2 +INFO - Inference 2/2 +INFO - Result for inf 0: no gow +INFO - Result for inf 1: stoppe +INFO - Final result: no gow stoppe +INFO - Profile for Inference: +INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED cycles: 13520864 +INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN cycles: 2841970 +INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED cycles: 2717670 +INFO - NPU ACTIVE cycles: 28909309 +INFO - NPU IDLE cycles: 863 +INFO - NPU total cycles: 28910172 ``` It could take several minutes to complete one inference run (average time is 2-3 minutes). @@ -578,9 +580,19 @@ The profiling section of the log shows that for the ASR inference: - Ethos-U55's PMU report: - - 28,924,740 active cycles: number of cycles that were used for computation + - 28,910,172 total cycle: The number of NPU cycles - - 426 idle cycles: number of cycles for which the NPU was idle + - 28,909,309 active cycles: number of NPU cycles that were used for computation + + - 863 idle cycles: number of cycles for which the NPU was idle + + - 13,520,864 AXI0 read cycles: The number of cycles the NPU spends to execute AXI0 read transactions. + AXI0 is the bus where Ethos-U55 NPU reads and writes to the computation buffers (activation buf/tensor arenas). + + - 2,841,970 AXI0 write cycles: The number of cycles the NPU spends to execute AXI0 write transactions. + + - 2,717,670 AXI1 read cycles: The number of cycles the NPU spends to execute AXI1 read transactions. + AXI1 is the bus where Ethos-U55 NPU reads the model (read only) - For FPGA platforms, CPU cycle count can also be enabled. For FVP, however, CPU cycle counters should not be used as the CPU model is not cycle-approximate or cycle-accurate. -- cgit v1.2.1