diff options
Diffstat (limited to 'samples/ObjectDetection/Readme.md')
-rw-r--r-- | samples/ObjectDetection/Readme.md | 453 |
1 files changed, 453 insertions, 0 deletions
diff --git a/samples/ObjectDetection/Readme.md b/samples/ObjectDetection/Readme.md new file mode 100644 index 0000000000..835bf3adeb --- /dev/null +++ b/samples/ObjectDetection/Readme.md @@ -0,0 +1,453 @@ +# Object Detection Example + +## Introduction +This is a sample code showing object detection using Arm NN public C++ API. The compiled application can take + + * a video file + +as input and + * save a video file + * or output video stream to the window + +with detections shown in bounding boxes, class labels and confidence. + +## Dependencies + +This example utilises OpenCV functions to capture and output video data. Top level inference API is provided by Arm NN +library. + +### Arm NN + +Object detection example build system does not trigger Arm NN compilation. Thus, before building the application, +please ensure that Arm NN libraries and header files are available on your build platform. +The application executable binary dynamically links with the following Arm NN libraries: +* libarmnn.so +* libarmnnTfLiteParser.so + +The build script searches for available Arm NN libraries in the following order: +1. Inside custom user directory specified by ARMNN_LIB_DIR cmake option. +2. Inside the current Arm NN repository, assuming that Arm NN was built following [this instructions](../../BuildGuideCrossCompilation.md). +3. Inside default locations for system libraries, assuming Arm NN was installed from deb packages. + +Arm NN header files will be searched in parent directory of found libraries files under `include` directory, i.e. +libraries found in `/usr/lib` or `/usr/lib64` and header files in `/usr/include` (or `${ARMNN_LIB_DIR}/include`). + +Please see [find_armnn.cmake](./cmake/find_armnn.cmake) for implementation details. + +### OpenCV + +This application uses [OpenCV (Open Source Computer Vision Library)](https://opencv.org/) for video stream processing. +Your host platform may have OpenCV available through linux package manager. If this is the case, please install it using +standard way. If not, our build system has a script to download and cross-compile required OpenCV modules +as well as [FFMPEG](https://ffmpeg.org/) and [x264 encoder](https://www.videolan.org/developers/x264.html) libraries. +The latter will build limited OpenCV functionality and application will support only video file input and video file output +way of working. Displaying video frames in a window requires building OpenCV with GTK and OpenGL support. + +The application executable binary dynamically links with the following OpenCV libraries: +* libopencv_core.so.4.0.0 +* libopencv_imgproc.so.4.0.0 +* libopencv_imgcodecs.so.4.0.0 +* libopencv_videoio.so.4.0.0 +* libopencv_video.so.4.0.0 +* libopencv_highgui.so.4.0.0 + +and transitively depends on: +* libavcodec.so (FFMPEG) +* libavformat.so (FFMPEG) +* libavutil.so (FFMPEG) +* libswscale.so (FFMPEG) +* libx264.so (x264) + +The application searches for above libraries in the following order: +1. Inside custom user directory specified by OPENCV_LIB_DIR cmake option. +2. Inside default locations for system libraries. + +If no OpenCV libraries were found, the cross-compilation build is extended with x264, ffmpeg and OpenCV compilation steps. + +Note: Native build does not add third party libraries to compilation. + +Please see [find_opencv.cmake](./cmake/find_opencv.cmake) for implementation details. + +## Building +There are two flows for building this application: +* native build on a host platform, +* cross-compilation for a Arm-based host platform. + +### Build Options + +* CMAKE_TOOLCHAIN_FILE - choose one of the available cross-compilation toolchain files: + * `cmake/aarch64-toolchain.cmake` + * `cmake/arm-linux-gnueabihf-toolchain.cmake` +* ARMNN_LIB_DIR - point to the custom location of the Arm NN libs and headers. +* OPENCV_LIB_DIR - point to the custom location of the OpenCV libs and headers. +* BUILD_UNIT_TESTS - set to `1` to build tests. Additionally to the main application, `object_detection_example-tests` +unit tests executable will be created. + +### Native Build +To build this application on a host platform, firstly ensure that required dependencies are installed: +For example, for raspberry PI: +```commandline +sudo apt-get update +sudo apt-get -yq install pkg-config +sudo apt-get -yq install libgtk2.0-dev zlib1g-dev libjpeg-dev libpng-dev libxvidcore-dev libx264-dev +sudo apt-get -yq install libavcodec-dev libavformat-dev libswscale-dev +``` + +To build demo application, create a build directory: +```commandline +mkdir build +cd build +``` +If you have already installed Arm NN and OpenCV: + +Inside build directory, run cmake and make commands: +```commandline +cmake .. +make +``` +This will build the following in bin directory: +* object_detection_example - application executable + +If you have custom Arm NN and OpenCV location, use `OPENCV_LIB_DIR` and `ARMNN_LIB_DIR` options: +```commandline +cmake -DARMNN_LIB_DIR=/path/to/armnn -DOPENCV_LIB_DIR=/path/to/opencv .. +make +``` + +### Cross-compilation + +This section will explain how to cross-compile the application and dependencies on a Linux x86 machine +for arm host platforms. + +You will require working cross-compilation toolchain supported by your host platform. For raspberry Pi 3 and 4 with glibc +runtime version 2.28, the following toolchains were successfully used: +* https://releases.linaro.org/components/toolchain/binaries/latest-7/aarch64-linux-gnu/ +* https://releases.linaro.org/components/toolchain/binaries/latest-7/arm-linux-gnueabihf/ + +Choose aarch64-linux-gnu if `lscpu` command shows architecture as aarch64 or arm-linux-gnueabihf if detected +architecture is armv71. + +You can check runtime version on your host platform by running: +``` +ldd --version +``` +On **build machine**, install C and C++ cross compiler toolchains and add them to the PATH variable. + +Install package dependencies: +```commandline +sudo apt-get update +sudo apt-get -yq install pkg-config +``` +Package config is required by OpenCV build to discover FFMPEG libs. + +To build demo application, create a build directory: +```commandline +mkdir build +cd build +``` +Inside build directory, run cmake and make commands: + +**Arm 32bit** +```commandline +cmake -DARMNN_LIB_DIR=<path-to-armnn-libs> -DCMAKE_TOOLCHAIN_FILE=cmake/arm-linux-gnueabihf-toolchain.cmake .. +make +``` +**Arm 64bit** +```commandline +cmake -DARMNN_LIB_DIR=<path-to-armnn-libs> -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64-toolchain.cmake .. +make +``` + +Add `-j` flag to the make command to run compilation in multiple threads. + +From the build directory, copy the following to the host platform: +* bin directory - contains object_detection_example executable, +* lib directory - contains cross-compiled OpenCV, ffmpeg, x264 libraries, +* Your Arm NN libs used during compilation. + +The full list of libs after cross-compilation to copy on your board: +``` +libarmnn.so +libarmnn.so.22 +libarmnn.so.23.0 +libarmnnTfLiteParser.so +libarmnnTfLiteParser.so.22.0 +libavcodec.so +libavcodec.so.58 +libavcodec.so.58.54.100 +libavdevice.so +libavdevice.so.58 +libavdevice.so.58.8.100 +libavfilter.so +libavfilter.so.7 +libavfilter.so.7.57.100 +libavformat.so +libavformat.so.58 +libavformat.so.58.29.100 +libavutil.so +libavutil.so.56 +libavutil.so.56.31.100 +libopencv_core.so +libopencv_core.so.4.0 +libopencv_core.so.4.0.0 +libopencv_highgui.so +libopencv_highgui.so.4.0 +libopencv_highgui.so.4.0.0 +libopencv_imgcodecs.so +libopencv_imgcodecs.so.4.0 +libopencv_imgcodecs.so.4.0.0 +libopencv_imgproc.so +libopencv_imgproc.so.4.0 +libopencv_imgproc.so.4.0.0 +libopencv_video.so +libopencv_video.so.4.0 +libopencv_video.so.4.0.0 +libopencv_videoio.so +libopencv_videoio.so.4.0 +libopencv_videoio.so.4.0.0 +libpostproc.so +libpostproc.so.55 +libpostproc.so.55.5.100 +libswresample.a +libswresample.so +libswresample.so.3 +libswresample.so.3.5.100 +libswscale.so +libswscale.so.5 +libswscale.so.5.5.100 +libx264.so +libx264.so.160 +``` +## Executing + +Once the application executable is built, it can be executed with the following options: +* --video-file-path: Path to the video file to run object detection on **[REQUIRED]** +* --model-file-path: Path to the Object Detection model to use **[REQUIRED]** +* --label-path: Path to the label set for the provided model file **[REQUIRED]** +* --model-name: The name of the model being used. Accepted options: SSD_MOBILE | YOLO_V3_TINY **[REQUIRED]** +* --output-video-file-path: Path to the output video file with detections added in. Defaults to /tmp/output.avi + **[OPTIONAL]** +* --preferred-backends: Takes the preferred backends in preference order, separated by comma. + For example: CpuAcc,GpuAcc,CpuRef. Accepted options: [CpuAcc, CpuRef, GpuAcc]. + Defaults to CpuRef **[OPTIONAL]** +* --help: Prints all the available options to screen + +### Object Detection on a supplied video file + +To run object detection on a supplied video file and output result to a video file: +```commandline +LD_LIBRARY_PATH=/path/to/armnn/libs:/path/to/opencv/libs ./object_detection_example --label-path /path/to/labels/file + --video-file-path /path/to/video/file --model-file-path /path/to/model/file + --model-name [YOLO_V3_TINY | SSD_MOBILE] --output-video-file-path /path/to/output/file +``` + +To run object detection on a supplied video file and output result to a window gui: +```commandline +LD_LIBRARY_PATH=/path/to/armnn/libs:/path/to/opencv/libs ./object_detection_example --label-path /path/to/labels/file + --video-file-path /path/to/video/file --model-file-path /path/to/model/file + --model-name [YOLO_V3_TINY | SSD_MOBILE] +``` + +This application has been verified to work against the MobileNet SSD model, which can be downloaded along with it's label set from: +* https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip + +--- + +# Application Overview +This section provides a walkthrough of the application, explaining in detail the steps: +1. Initialisation + 1. Reading from Video Source + 2. Preparing Labels and Model Specific Functions +2. Creating a Network + 1. Creating Parser and Importing Graph + 3. Optimizing Graph for Compute Device + 4. Creating Input and Output Binding Information +3. Object detection pipeline + 1. Pre-processing the Captured Frame + 2. Making Input and Output Tensors + 3. Executing Inference + 4. Postprocessing + 5. Decoding and Processing Inference Output + 6. Drawing Bounding Boxes + + +### Initialisation + +##### Reading from Video Source +After parsing user arguments, the chosen video file or stream is loaded into an OpenCV `cv::VideoCapture` object. +We use [`IFrameReader`](./include/IFrameReader.hpp) interface and OpenCV specific implementation +[`CvVideoFrameReader`](./include/CvVideoFrameReader.hpp) in our main function to capture frames from the source using the +`ReadFrame()` function. + +The `CvVideoFrameReader` object also tells us information about the input video. Using this information and application +arguments, we create one of the implementations of the [`IFrameOutput`](./include/IFrameOutput.hpp) interface: +[`CvVideoFileWriter`](./include/CvVideoFileWriter.hpp) or [`CvWindowOutput`](./include/CvWindowOutput.hpp). +This object will be used at the end of every loop to write the processed frame to an output video file or gui +window. +`CvVideoFileWriter` uses `cv::VideoWriter` with ffmpeg backend. `CvWindowOutput` makes use of `cv::imshow()` function. + +See `GetFrameSourceAndSink` function in [Main.cpp](./src/Main.cpp) for more details. + +##### Preparing Labels and Model Specific Functions +In order to interpret the result of running inference on the loaded network, it is required to load the labels +associated with the model. In the provided example code, the `AssignColourToLabel` function creates a vector of pairs +label - colour that is ordered according to object class index at the output node of the model. Labels are assigned with +a randomly generated RGB color. This ensures that each class has a unique color which will prove helpful when plotting +the bounding boxes of various detected objects in a frame. + +Depending on the model being used, `CreatePipeline` function returns specific implementation of the object detection +pipeline. + +### Creating a Network + +All operations with Arm NN and networks are encapsulated in [`ArmnnNetworkExecutor`](./include/ArmnnNetworkExecutor.hpp) +class. + +##### Creating Parser and Importing Graph +The first step with Arm NN SDK is to import a graph from file by using the appropriate parser. + +The Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically +focus on `.tflite, .pb, .onnx` models. + +Based on the extension of the provided model file, the corresponding parser is created and the network file loaded with +`CreateNetworkFromBinaryFile()` method. The parser will handle the creation of the underlying Arm NN graph. + +Current example accepts tflite format model files, we use `ITfLiteParser`: +```c++ +#include "armnnTfLiteParser/ITfLiteParser.hpp" + +armnnTfLiteParser::ITfLiteParserPtr parser = armnnTfLiteParser::ITfLiteParser::Create(); +armnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile(modelPath.c_str()); +``` + +##### Optimizing Graph for Compute Device +Arm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the +appropriate device context. We do this by creating a runtime context with default options with `IRuntime()`. + +For example: +```c++ +#include "armnn/ArmNN.hpp" + +auto runtime = armnn::IRuntime::Create(armnn::IRuntime::CreationOptions()); +``` + +We can optimize the imported graph by specifying a list of backends in order of preference and implement +backend-specific optimizations. The backends are identified by a string unique to the backend, +for example `CpuAcc, GpuAcc, CpuRef`. + +For example: +```c++ +std::vector<armnn::BackendId> backends{"CpuAcc", "GpuAcc", "CpuRef"}; +``` + +Internally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs +function on each of them and, if possible, substitutes the corresponding subgraph in the original graph with +its optimized version. + +Using the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute +device with `LoadNetwork()`. This function creates the backend-specific workloads +for the layers and a backend specific workload factory which is called to create the workloads. + +For example: +```c++ +armnn::IOptimizedNetworkPtr optNet = Optimize(*network, + backends, + m_Runtime->GetDeviceSpec(), + armnn::OptimizerOptions()); +std::string errorMessage; +runtime->LoadNetwork(0, std::move(optNet), errorMessage)); +std::cerr << errorMessage << std::endl; +``` + +##### Creating Input and Output Binding Information +Parsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames` +we extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph. +For example: +```c++ +std::vector<std::string> inputNames = parser->GetSubgraphInputTensorNames(0); +auto inputBindingInfo = parser->GetNetworkInputBindingInfo(0, inputNames[0]); +``` +The input binding information contains all the essential information about the input. It is a tuple consisting of +integer identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information, +number of dimensions, total number of elements). + +Similarly, we can get the output binding information for an output layer by using the parser to retrieve output +tensor names and calling `GetNetworkOutputBindingInfo()`. + +### Object detection pipeline + +Generic object detection pipeline has 3 steps to perform data pre-processing, run inference and decode inference results +in the post-processing step. + +See [`ObjDetectionPipeline`](./include/NetworkPipeline.hpp) and implementations for [`MobileNetSSDv1`](./include/NetworkPipeline.hpp) +and [`YoloV3Tiny`](./include/NetworkPipeline.hpp) for more details. + +#### Pre-processing the Captured Frame +Each frame captured from source is read as an `cv::Mat` in BGR format but channels are swapped to RGB in a frame reader +code. + +```c++ +cv::Mat processed; +... +objectDetectionPipeline->PreProcessing(frame, processed); +``` + +A pre-processing step consists of resizing the frame to the required resolution, padding and doing data type conversion +to match the model input layer. +For example, SSD MobileNet V1 that is used in our example takes for input a tensor with shape `[1, 300, 300, 3]` and +data type `uint8`. + +Pre-processing step returns `cv::Mat` object containing data ready for inference. + +#### Executing Inference +```c++ +od::InferenceResults results; +... +objectDetectionPipeline->Inference(processed, results); +``` +Inference step will call `ArmnnNetworkExecutor::Run` method that will prepare input tensors and execute inference. +A compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context. +For example: +```c++ +//const void* inputData = ...; +//outputTensors were pre-allocated before + +armnn::InputTensors inputTensors = {{ inputBindingInfo.first,armnn::ConstTensor(inputBindingInfo.second, inputData)}}; +runtime->EnqueueWorkload(0, inputTensors, outputTensors); +``` +We allocate memory for output data once and map it to output tensor objects. After successful inference, we read data +from the pre-allocated output data buffer. See [`ArmnnNetworkExecutor::ArmnnNetworkExecutor`](./src/ArmnnNetworkExecutor.cpp) +and [`ArmnnNetworkExecutor::Run`](./src/ArmnnNetworkExecutor.cpp) for more details. + +#### Postprocessing + +##### Decoding and Processing Inference Output +The output from inference must be decoded to obtain information about detected objects in the frame. In the examples +there are implementations for two networks but you may also implement your own network decoding solution here. + +For SSD MobileNet V1 models, we decode the results to obtain the bounding box positions, classification index, +confidence and number of detections in the input frame. +See [`SSDResultDecoder`](./include/SSDResultDecoder.hpp) for more details. + +For YOLO V3 Tiny models, we decode the output and perform non-maximum suppression to filter out any weak detections +below a confidence threshold and any redudant bounding boxes above an intersection-over-union threshold. +See [`YoloResultDecoder`](./include/YoloResultDecoder.hpp) for more details. + +It is encouraged to experiment with threshold values for confidence and intersection-over-union (IoU) +to achieve the best visual results. + +The detection results are always returned as a vector of [`DetectedObject`](./include/DetectedObject.hpp), +with the box positions list containing bounding box coordinates in the form `[x_min, y_min, x_max, y_max]`. + +#### Drawing Bounding Boxes +Post-processing step accepts a callback function to be invoked when the decoding is finished. We will use it +to draw detections on the initial frame. +With the obtained detections and using [`AddInferenceOutputToFrame`](./src/ImageUtils.cpp) function, we are able to draw bounding boxes around +detected objects and add the associated label and confidence score. +```c++ +//results - inference output +objectDetectionPipeline->PostProcessing(results, [&frame, &labels](od::DetectedObjects detects) -> void { + AddInferenceOutputToFrame(detects, *frame, labels); + }); +``` +The processed frames are written to a file or displayed in a separate window.
\ No newline at end of file |