path: root/samples/ObjectDetection/Readme.md
diff options
authorÉanna Ó Catháin <eanna.ocathain@arm.com>2020-09-14 17:36:49 +0100
committerJim Flynn <jim.flynn@arm.com>2020-09-14 18:40:01 +0000
commit919c14ef132986aa1514b2070ce6d19b5579a6ab (patch)
tree5c281e02a083768f65871cb861ab9b32ac7d8767 /samples/ObjectDetection/Readme.md
parent589e3e81a86c83456580e112978bf7a0ed5f43ac (diff)
MLECO-929 Add Object Detection sample application using the public ArmNN C++ API
Change-Id: I14aa1b4b726212cffbefd6687203f93f936fa872 Signed-off-by: Éanna Ó Catháin <eanna.ocathain@arm.com>
Diffstat (limited to 'samples/ObjectDetection/Readme.md')
1 files changed, 453 insertions, 0 deletions
diff --git a/samples/ObjectDetection/Readme.md b/samples/ObjectDetection/Readme.md
new file mode 100644
index 0000000000..835bf3adeb
--- /dev/null
+++ b/samples/ObjectDetection/Readme.md
@@ -0,0 +1,453 @@
+# Object Detection Example
+## Introduction
+This is a sample code showing object detection using Arm NN public C++ API. The compiled application can take
+ * a video file
+as input and
+ * save a video file
+ * or output video stream to the window
+with detections shown in bounding boxes, class labels and confidence.
+## Dependencies
+This example utilises OpenCV functions to capture and output video data. Top level inference API is provided by Arm NN
+### Arm NN
+Object detection example build system does not trigger Arm NN compilation. Thus, before building the application,
+please ensure that Arm NN libraries and header files are available on your build platform.
+The application executable binary dynamically links with the following Arm NN libraries:
+* libarmnn.so
+* libarmnnTfLiteParser.so
+The build script searches for available Arm NN libraries in the following order:
+1. Inside custom user directory specified by ARMNN_LIB_DIR cmake option.
+2. Inside the current Arm NN repository, assuming that Arm NN was built following [this instructions](../../BuildGuideCrossCompilation.md).
+3. Inside default locations for system libraries, assuming Arm NN was installed from deb packages.
+Arm NN header files will be searched in parent directory of found libraries files under `include` directory, i.e.
+libraries found in `/usr/lib` or `/usr/lib64` and header files in `/usr/include` (or `${ARMNN_LIB_DIR}/include`).
+Please see [find_armnn.cmake](./cmake/find_armnn.cmake) for implementation details.
+### OpenCV
+This application uses [OpenCV (Open Source Computer Vision Library)](https://opencv.org/) for video stream processing.
+Your host platform may have OpenCV available through linux package manager. If this is the case, please install it using
+standard way. If not, our build system has a script to download and cross-compile required OpenCV modules
+as well as [FFMPEG](https://ffmpeg.org/) and [x264 encoder](https://www.videolan.org/developers/x264.html) libraries.
+The latter will build limited OpenCV functionality and application will support only video file input and video file output
+way of working. Displaying video frames in a window requires building OpenCV with GTK and OpenGL support.
+The application executable binary dynamically links with the following OpenCV libraries:
+* libopencv_core.so.4.0.0
+* libopencv_imgproc.so.4.0.0
+* libopencv_imgcodecs.so.4.0.0
+* libopencv_videoio.so.4.0.0
+* libopencv_video.so.4.0.0
+* libopencv_highgui.so.4.0.0
+and transitively depends on:
+* libavcodec.so (FFMPEG)
+* libavformat.so (FFMPEG)
+* libavutil.so (FFMPEG)
+* libswscale.so (FFMPEG)
+* libx264.so (x264)
+The application searches for above libraries in the following order:
+1. Inside custom user directory specified by OPENCV_LIB_DIR cmake option.
+2. Inside default locations for system libraries.
+If no OpenCV libraries were found, the cross-compilation build is extended with x264, ffmpeg and OpenCV compilation steps.
+Note: Native build does not add third party libraries to compilation.
+Please see [find_opencv.cmake](./cmake/find_opencv.cmake) for implementation details.
+## Building
+There are two flows for building this application:
+* native build on a host platform,
+* cross-compilation for a Arm-based host platform.
+### Build Options
+* CMAKE_TOOLCHAIN_FILE - choose one of the available cross-compilation toolchain files:
+ * `cmake/aarch64-toolchain.cmake`
+ * `cmake/arm-linux-gnueabihf-toolchain.cmake`
+* ARMNN_LIB_DIR - point to the custom location of the Arm NN libs and headers.
+* OPENCV_LIB_DIR - point to the custom location of the OpenCV libs and headers.
+* BUILD_UNIT_TESTS - set to `1` to build tests. Additionally to the main application, `object_detection_example-tests`
+unit tests executable will be created.
+### Native Build
+To build this application on a host platform, firstly ensure that required dependencies are installed:
+For example, for raspberry PI:
+sudo apt-get update
+sudo apt-get -yq install pkg-config
+sudo apt-get -yq install libgtk2.0-dev zlib1g-dev libjpeg-dev libpng-dev libxvidcore-dev libx264-dev
+sudo apt-get -yq install libavcodec-dev libavformat-dev libswscale-dev
+To build demo application, create a build directory:
+mkdir build
+cd build
+If you have already installed Arm NN and OpenCV:
+Inside build directory, run cmake and make commands:
+cmake ..
+This will build the following in bin directory:
+* object_detection_example - application executable
+If you have custom Arm NN and OpenCV location, use `OPENCV_LIB_DIR` and `ARMNN_LIB_DIR` options:
+cmake -DARMNN_LIB_DIR=/path/to/armnn -DOPENCV_LIB_DIR=/path/to/opencv ..
+### Cross-compilation
+This section will explain how to cross-compile the application and dependencies on a Linux x86 machine
+for arm host platforms.
+You will require working cross-compilation toolchain supported by your host platform. For raspberry Pi 3 and 4 with glibc
+runtime version 2.28, the following toolchains were successfully used:
+* https://releases.linaro.org/components/toolchain/binaries/latest-7/aarch64-linux-gnu/
+* https://releases.linaro.org/components/toolchain/binaries/latest-7/arm-linux-gnueabihf/
+Choose aarch64-linux-gnu if `lscpu` command shows architecture as aarch64 or arm-linux-gnueabihf if detected
+architecture is armv71.
+You can check runtime version on your host platform by running:
+ldd --version
+On **build machine**, install C and C++ cross compiler toolchains and add them to the PATH variable.
+Install package dependencies:
+sudo apt-get update
+sudo apt-get -yq install pkg-config
+Package config is required by OpenCV build to discover FFMPEG libs.
+To build demo application, create a build directory:
+mkdir build
+cd build
+Inside build directory, run cmake and make commands:
+**Arm 32bit**
+cmake -DARMNN_LIB_DIR=<path-to-armnn-libs> -DCMAKE_TOOLCHAIN_FILE=cmake/arm-linux-gnueabihf-toolchain.cmake ..
+**Arm 64bit**
+cmake -DARMNN_LIB_DIR=<path-to-armnn-libs> -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64-toolchain.cmake ..
+Add `-j` flag to the make command to run compilation in multiple threads.
+From the build directory, copy the following to the host platform:
+* bin directory - contains object_detection_example executable,
+* lib directory - contains cross-compiled OpenCV, ffmpeg, x264 libraries,
+* Your Arm NN libs used during compilation.
+The full list of libs after cross-compilation to copy on your board:
+## Executing
+Once the application executable is built, it can be executed with the following options:
+* --video-file-path: Path to the video file to run object detection on **[REQUIRED]**
+* --model-file-path: Path to the Object Detection model to use **[REQUIRED]**
+* --label-path: Path to the label set for the provided model file **[REQUIRED]**
+* --model-name: The name of the model being used. Accepted options: SSD_MOBILE | YOLO_V3_TINY **[REQUIRED]**
+* --output-video-file-path: Path to the output video file with detections added in. Defaults to /tmp/output.avi
+ **[OPTIONAL]**
+* --preferred-backends: Takes the preferred backends in preference order, separated by comma.
+ For example: CpuAcc,GpuAcc,CpuRef. Accepted options: [CpuAcc, CpuRef, GpuAcc].
+ Defaults to CpuRef **[OPTIONAL]**
+* --help: Prints all the available options to screen
+### Object Detection on a supplied video file
+To run object detection on a supplied video file and output result to a video file:
+LD_LIBRARY_PATH=/path/to/armnn/libs:/path/to/opencv/libs ./object_detection_example --label-path /path/to/labels/file
+ --video-file-path /path/to/video/file --model-file-path /path/to/model/file
+ --model-name [YOLO_V3_TINY | SSD_MOBILE] --output-video-file-path /path/to/output/file
+To run object detection on a supplied video file and output result to a window gui:
+LD_LIBRARY_PATH=/path/to/armnn/libs:/path/to/opencv/libs ./object_detection_example --label-path /path/to/labels/file
+ --video-file-path /path/to/video/file --model-file-path /path/to/model/file
+ --model-name [YOLO_V3_TINY | SSD_MOBILE]
+This application has been verified to work against the MobileNet SSD model, which can be downloaded along with it's label set from:
+* https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
+# Application Overview
+This section provides a walkthrough of the application, explaining in detail the steps:
+1. Initialisation
+ 1. Reading from Video Source
+ 2. Preparing Labels and Model Specific Functions
+2. Creating a Network
+ 1. Creating Parser and Importing Graph
+ 3. Optimizing Graph for Compute Device
+ 4. Creating Input and Output Binding Information
+3. Object detection pipeline
+ 1. Pre-processing the Captured Frame
+ 2. Making Input and Output Tensors
+ 3. Executing Inference
+ 4. Postprocessing
+ 5. Decoding and Processing Inference Output
+ 6. Drawing Bounding Boxes
+### Initialisation
+##### Reading from Video Source
+After parsing user arguments, the chosen video file or stream is loaded into an OpenCV `cv::VideoCapture` object.
+We use [`IFrameReader`](./include/IFrameReader.hpp) interface and OpenCV specific implementation
+[`CvVideoFrameReader`](./include/CvVideoFrameReader.hpp) in our main function to capture frames from the source using the
+`ReadFrame()` function.
+The `CvVideoFrameReader` object also tells us information about the input video. Using this information and application
+arguments, we create one of the implementations of the [`IFrameOutput`](./include/IFrameOutput.hpp) interface:
+[`CvVideoFileWriter`](./include/CvVideoFileWriter.hpp) or [`CvWindowOutput`](./include/CvWindowOutput.hpp).
+This object will be used at the end of every loop to write the processed frame to an output video file or gui
+`CvVideoFileWriter` uses `cv::VideoWriter` with ffmpeg backend. `CvWindowOutput` makes use of `cv::imshow()` function.
+See `GetFrameSourceAndSink` function in [Main.cpp](./src/Main.cpp) for more details.
+##### Preparing Labels and Model Specific Functions
+In order to interpret the result of running inference on the loaded network, it is required to load the labels
+associated with the model. In the provided example code, the `AssignColourToLabel` function creates a vector of pairs
+label - colour that is ordered according to object class index at the output node of the model. Labels are assigned with
+a randomly generated RGB color. This ensures that each class has a unique color which will prove helpful when plotting
+the bounding boxes of various detected objects in a frame.
+Depending on the model being used, `CreatePipeline` function returns specific implementation of the object detection
+### Creating a Network
+All operations with Arm NN and networks are encapsulated in [`ArmnnNetworkExecutor`](./include/ArmnnNetworkExecutor.hpp)
+##### Creating Parser and Importing Graph
+The first step with Arm NN SDK is to import a graph from file by using the appropriate parser.
+The Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically
+focus on `.tflite, .pb, .onnx` models.
+Based on the extension of the provided model file, the corresponding parser is created and the network file loaded with
+`CreateNetworkFromBinaryFile()` method. The parser will handle the creation of the underlying Arm NN graph.
+Current example accepts tflite format model files, we use `ITfLiteParser`:
+#include "armnnTfLiteParser/ITfLiteParser.hpp"
+armnnTfLiteParser::ITfLiteParserPtr parser = armnnTfLiteParser::ITfLiteParser::Create();
+armnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile(modelPath.c_str());
+##### Optimizing Graph for Compute Device
+Arm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the
+appropriate device context. We do this by creating a runtime context with default options with `IRuntime()`.
+For example:
+#include "armnn/ArmNN.hpp"
+auto runtime = armnn::IRuntime::Create(armnn::IRuntime::CreationOptions());
+We can optimize the imported graph by specifying a list of backends in order of preference and implement
+backend-specific optimizations. The backends are identified by a string unique to the backend,
+for example `CpuAcc, GpuAcc, CpuRef`.
+For example:
+std::vector<armnn::BackendId> backends{"CpuAcc", "GpuAcc", "CpuRef"};
+Internally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs
+function on each of them and, if possible, substitutes the corresponding subgraph in the original graph with
+its optimized version.
+Using the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute
+device with `LoadNetwork()`. This function creates the backend-specific workloads
+for the layers and a backend specific workload factory which is called to create the workloads.
+For example:
+armnn::IOptimizedNetworkPtr optNet = Optimize(*network,
+ backends,
+ m_Runtime->GetDeviceSpec(),
+ armnn::OptimizerOptions());
+std::string errorMessage;
+runtime->LoadNetwork(0, std::move(optNet), errorMessage));
+std::cerr << errorMessage << std::endl;
+##### Creating Input and Output Binding Information
+Parsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames`
+we extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph.
+For example:
+std::vector<std::string> inputNames = parser->GetSubgraphInputTensorNames(0);
+auto inputBindingInfo = parser->GetNetworkInputBindingInfo(0, inputNames[0]);
+The input binding information contains all the essential information about the input. It is a tuple consisting of
+integer identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information,
+number of dimensions, total number of elements).
+Similarly, we can get the output binding information for an output layer by using the parser to retrieve output
+tensor names and calling `GetNetworkOutputBindingInfo()`.
+### Object detection pipeline
+Generic object detection pipeline has 3 steps to perform data pre-processing, run inference and decode inference results
+in the post-processing step.
+See [`ObjDetectionPipeline`](./include/NetworkPipeline.hpp) and implementations for [`MobileNetSSDv1`](./include/NetworkPipeline.hpp)
+and [`YoloV3Tiny`](./include/NetworkPipeline.hpp) for more details.
+#### Pre-processing the Captured Frame
+Each frame captured from source is read as an `cv::Mat` in BGR format but channels are swapped to RGB in a frame reader
+cv::Mat processed;
+objectDetectionPipeline->PreProcessing(frame, processed);
+A pre-processing step consists of resizing the frame to the required resolution, padding and doing data type conversion
+to match the model input layer.
+For example, SSD MobileNet V1 that is used in our example takes for input a tensor with shape `[1, 300, 300, 3]` and
+data type `uint8`.
+Pre-processing step returns `cv::Mat` object containing data ready for inference.
+#### Executing Inference
+od::InferenceResults results;
+objectDetectionPipeline->Inference(processed, results);
+Inference step will call `ArmnnNetworkExecutor::Run` method that will prepare input tensors and execute inference.
+A compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context.
+For example:
+//const void* inputData = ...;
+//outputTensors were pre-allocated before
+armnn::InputTensors inputTensors = {{ inputBindingInfo.first,armnn::ConstTensor(inputBindingInfo.second, inputData)}};
+runtime->EnqueueWorkload(0, inputTensors, outputTensors);
+We allocate memory for output data once and map it to output tensor objects. After successful inference, we read data
+from the pre-allocated output data buffer. See [`ArmnnNetworkExecutor::ArmnnNetworkExecutor`](./src/ArmnnNetworkExecutor.cpp)
+and [`ArmnnNetworkExecutor::Run`](./src/ArmnnNetworkExecutor.cpp) for more details.
+#### Postprocessing
+##### Decoding and Processing Inference Output
+The output from inference must be decoded to obtain information about detected objects in the frame. In the examples
+there are implementations for two networks but you may also implement your own network decoding solution here.
+For SSD MobileNet V1 models, we decode the results to obtain the bounding box positions, classification index,
+confidence and number of detections in the input frame.
+See [`SSDResultDecoder`](./include/SSDResultDecoder.hpp) for more details.
+For YOLO V3 Tiny models, we decode the output and perform non-maximum suppression to filter out any weak detections
+below a confidence threshold and any redudant bounding boxes above an intersection-over-union threshold.
+See [`YoloResultDecoder`](./include/YoloResultDecoder.hpp) for more details.
+It is encouraged to experiment with threshold values for confidence and intersection-over-union (IoU)
+to achieve the best visual results.
+The detection results are always returned as a vector of [`DetectedObject`](./include/DetectedObject.hpp),
+with the box positions list containing bounding box coordinates in the form `[x_min, y_min, x_max, y_max]`.
+#### Drawing Bounding Boxes
+Post-processing step accepts a callback function to be invoked when the decoding is finished. We will use it
+to draw detections on the initial frame.
+With the obtained detections and using [`AddInferenceOutputToFrame`](./src/ImageUtils.cpp) function, we are able to draw bounding boxes around
+detected objects and add the associated label and confidence score.
+//results - inference output
+objectDetectionPipeline->PostProcessing(results, [&frame, &labels](od::DetectedObjects detects) -> void {
+ AddInferenceOutputToFrame(detects, *frame, labels);
+ });
+The processed frames are written to a file or displayed in a separate window. \ No newline at end of file