Blog: Reduce video processing with an additional 75% and retain high detection rates
In today’s blog we share with you how we achieve significant data reductions in the video stream without compromising the detection power of the car’s perception stack.
With the advent of smarter cars that make more and more decisions independently, cars are equipped with a whole gamut of sensors. Naturally, this leads to an explosion of data that is generated in the car and brings new challenges with it. Handling these large amounts of data, manufacturers employ a two-pronged approach. For safety critical features (e.g. breaking, swerving, etc.) data analysis and decisions are mostly done at the edge (i.e. in the car). On the other hand, activities as routing, navigation, mapping and model improvement can be done in the cloud where more computational resources are available. Both types of use cases struggle with large amounts of data for different reasons. At the edge it is mainly the limit of computational resources available and for processing in the cloud the bottleneck is the data transmission via mobile networks.
We at Teraki enable both, computations in the car and in the cloud, by reducing the data footprint without inadvertently compromising the data’s predictive power. The problem with traditional video compression approaches, like the H.264, H.265 or Motion JPEG codecs is that they are optimized for human perception but have a detrimental effect on a machine learning-based perception stack. The reason for the detrimental effect on the car’s perception stack is that traditional compression solutions attenuate high spatial frequencies, which humans are not sensitive to, but computer vision systems generally are.
With the rise of autonomous driving and the internet of things (IoT), the amount of streamed video data is increasing. Connected 1080p and 4k resolution video cameras require data rates much larger than needed for time-series data transmission for real-time transmission.
On the other hand, data transmission is limited by available bandwidth and comes at a financial cost. Besides, high data transmission can lead to high latencies when facing a bandwidth bottleneck. For this reason, applications relying on near to real-time video transmission are virtually impossible. Currently, the low latency data transmission is restricted to strongly compressed video streams and hence, limited to low quality. Moreover, the loss in video quality is not only perceptually undesirable, but the appearing compression artefacts also interfere with video analysis algorithms such as detection neural networks. To make matters worse, existing compression solutions result in a quality loss distributed uniformly in the images, with artefacts also appearing in most important parts of the video stream.
In this blog we share with you how Teraki enables customers to significantly reduce video data at the edge, whilst keeping the highest detection accuracies of the car’s perception stack. Our solution is a region of interest based smart video compression. In a nutshell, we compress irrelevant regions of the incoming video stream stronger than relevant regions.
We segment the incoming video stream into regions of interest (RoI) and regions outside the RoI.
In the figure below, we highlighted the RoI in red and regions outside the RoI in blue. This segmentation enables smart video compression by using different strengths of compression for different regions of the video stream.
With this segmentation, we can use a (lossless) encoding in the RoI and a lossy encoding that has great data rate savings outside the RoI. An example of this can be seen in the picture above.
One can clearly see the effect of the compression on the sky and the lossless transmission of the RoI. Note that the RoI can also be defined using: speed, relative speed, distance (near/far), semantic classification, etc. Teraki technology allows for mixing multiple RoI definitions.
Our video segmentation and compression stack is built with lightweight processing stages that are designed to run embedded on (automotive) hardware that typically has low computational and storage capacity. By ‘virtually upgrading’ the performance existing hardware via software, Teraki takes away the need for customers to buy expensive GPU’s.
Teraki’s codec can also incorporate the user’s prior knowledge about the location of the RoI, which could be acquired using different sensors and algorithms already available to the user. In the case of an autonomous driving application, the information might be available to the user from sensor fusion, e.g. radar, LIDAR, reference maps, additional cameras, or thanks to other computer vision methods. Should this be the case, we provide a convenient interface for integrating the user’s prior knowledge with the RoI based compression.
Teraki’s compression algorithms are designed to minimize the transmission and storage bottleneck at a low computational cost. We propose Region of Interest (RoI) and Frame of Interest based video compression resulting in up to 75% data rate savings on top of existing H.264 codec without any additional quality loss in important parts of the video stream. This is achieved on the encoder level only and the algorithm is compatible with the existing H.264 video decoders. These high additional data rate savings have no or very limited impact on the accuracy of the existing state-of-the-art detection neural networks.
How to benchmark these results? We will visit that in one of our next blogs to come.