Blog: Real-time video annotation with deep learning model
Deep learning models make it possible for you to have real-time annotation on a video stream, although it’s a challenging task. To make this possible, you might need an efficient mechanism or a good system design. Here, I’ll demonstrate one way to make this work. The proposed method can be used to label each frame of the video. The labels can then be computed with any heavy deep learning model. As an example, I use a facial age estimation model to annotate the detected faces (the bounding boxes) and the estimated ages for the people in each frame in real-time video streaming.
The cost of inference process in both facial detection and age estimation models can be higher than generating frames from a webcam. With the help of visual tracking algorithms such as Kernelized Correlation Filter, TLD, and MedianFlow, we can hide the latency from a video frame as an input of a deep model and an inference result generated by the model. We simultaneously process a visual tracking algorithm and deep model inference. This way, each video frame is annotated with the bounding box information generated either from visual tracking or a deep model. If the bounding box generated by the visual tracking algorithm deviates from the actual location, we update the annotated results from a deep learning model to change the initial position of the bounding box with the visual tracking algorithm. This method combines the annotating information with the visual tracking algorithm and a deep learning model. By doing this we resolve the latency issue for real-time video annotation and the framework can be applied to any deep learning video annotation model… Well, more specifically, by “any model” here I mean it can be applied to any high-computational time cost models such as video-based object detection, video segmentation, and so on.
You can read the full blog post on IBM Developer.
Originally published at https://developer.ibm.com.