Comparing VVC, HEVC and AV1 using Objective and Subjective Assessments

Fan Zhang, Angeliki Katsenou, Mariana Afonso, Goce Dimitrov and David Bull

ABSTRACT

In this paper, the performance of three state-of-the-art video codecs: High Efficiency Video Coding (HEVC) Test Model (HM), AOMedia Video 1 (AV1) and Versatile Video Coding Test Model (VTM), are evaluated using both objective and subjective quality assessments. Nine source sequences were carefully selected to offer both diversity and representativeness, and different resolution versions were encoded by all three codecs at pre-defined target bitrates. The compression efficiency of the three codecs are evaluated using two commonly used objective quality metrics, PSNR and VMAF. The subjective quality of their reconstructed content is also evaluated through psychophysical experiments. Furthermore, HEVC and AV1 are compared within a dynamic optimization framework (convex hull rate-distortion optimization) across resolutions with a wider bitrate, using both objective and subjective evaluations. Finally the computational complexities of three tested codecs are compared. The subjective assessments indicate that, for the tested versions there is no significant difference between AV1 and HM, while the tested VTM version shows significant enhancements. The selected source sequences, compressed video content and associated subjective data are available online, offering a resource for compression performance evaluation and objective video quality assessment.

Parts of this work have been presented in the IEEE International Conference on Image Processing (ICIP) 2019 in Taipei and in the Alliance for Open Media (AOM) Symposium 2019 in San Francisco.

SOURCE SEQUENCES

DATABASE

[DOWNLOAD] subjective data.

[DOWNLOAD] all videos from University of Bristol Research Data Storage Facility.

If this content has been mentioned in a research publication, please give credit to the University of Bristol, by referencing the following paper:

[1] A. V. Katsenou, F. Zhang, M. Afonso and D. R. Bull, “A Subjective Comparison of AV1 and HEVC for Adaptive Video Streaming,” 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 4145-4149.

[2] F. Zhang, A. V. Katsenou, M. Afonso, Goce Dimitrov and D. R. Bull, “Comparing VVC, HEVC and AV1 using Objective and Subjective Assessments”, arXiv:2003. 10282 [eess.IV], 2020.

Computational cameras

A novel close-to-sensor computational camera has been designed and developed at the ViLab. ROIs can be captured and processed at 1000fps; the concurrent processing enables low latency sensor control and flexible image processing. With 9DoF motion sensing, the low, size, weight and power form-factor makes it ideally suited for robotics and UAV applications. The modular design allows multiple configurations and output options, easing development of embedded applications. General purpose output can directly interface with external devices such as servos and motors while ethernet offers a conventional image output capability. A binocular system can be configured with self-driven pan/tilt positioning, as an autonomous verging system or as a standard stereo pair. More information can be found here xcamflyer.

Terrain analysis for biped locomotion

Numerous scenarios exist where it is necessary or advantageous to classify surface material at a distance from a moving forward-facing camera. Examples include the use of image based sensors for assessing and predicting terrain type in association with the control or navigation of autonomous vehicles. In many real scenarios, the upcoming terrain might not just be flat but may also be oblique and vehicles may need to change speed and gear to ensure safe and clean motion.

Blur-robust texture features

Videos captured with moving cameras, particularly those attached to biped robots, often exhibit blur due to incorrect focus or slow shutter speed. Blurring effects generally alter the spatial and frequency characteristics of the content and this may reduce the performance of a classifier. Robust texture features are therefore developed to deal with this problem. [Matlab Code]

Terrain classification from body-mounted cameras during human locomotion

A novel algorithm for terrain type classification based on monocular video captured from the viewpoint of human locomotion is introduced. A texture-based algorithm is developed to classify the path ahead into multiple groups that can be used to support terrain classification. Gait is taken into account in two ways. Firstly, for key frame selection, when regions with homogeneous texture characteristics are updated, the frequency variations of the textured surface are analysed and used to adaptively define filter coefficients. Secondly, it is incorporated in the parameter estimation process where probabilities of path consistency are employed to improve terrain-type estimation [Matlab Code]. Figures below show the proposed process of terrain classification for tracked regions and a result. [PDF]

Label 1 (green), Label 2 (red) and Label 3 (blue) correspond to the areas classified as hard surfaces, soft surfaces and unwalkable areas, respectively. The size of the circle indicates probabilities – bigger implies higher confidence of classification.

Planar orientation estimation by texture

The gradient of a road or terrain influences the appropriate speed and power of a vehicle traversing it. Therefore, gradient prediction is necessary if autonomous vehicles are to optimise their locomotion. A novel texture-based method for estimating the orientation of planar surfaces under the basic assumption of homogeneity has been developed for scenarios that only a single image source exists, which also includes where a region of interest is too further to employ a depth estimation technique.

References

  • Terrain classification from body-mounted cameras during human locomotion. N. Anantrasirichai, J. Burn and David Bull. IEEE Transactions on Cybernetics. [PDF] [Matlab Code].
  • Projective image restoration using sparsity regularization. N. Anantrasirichai, J. Burn and David Bull. ICIP 2013. [PDF] [Matlab Code]
  • Robust texture features for blurred images using undecimated dual-tree complex wavelets. N. Anantrasirichai, J. Burn and David Bull. ICIP 2014. [PDF] [Matlab Code]
  • Orientation estimation for planar textured surfaces based on complex wavelets. N. Anantrasirichai, J. Burn and David Bull. ICIP 2014. [PDF]
  • Robust texture features based on undecimated dual-tree complex wavelets and local magnitude binary patterns. N. Anantrasirichai, J. Burn and David Bull. ICIP 2015. [PDF]

Mitigating the effects of atmospheric turbulence on surveillance imagery

Various types of atmospheric distortion can influence the visual quality of video signals during acquisition. Typical distortions include fog or haze which reduce contrast, and atmospheric turbulence due to temperature variations or aerosols. An effect of temperature variation is observed as a change in the interference pattern of the light refraction, causing unclear, unsharp, waving images of the objects. This obviously makes the acquired imagery difficult to interpret.

This project introduced a novel method for mitigating the effects of atmospheric distortion on observed images, particularly airborne turbulence which can severely degrade a region of interest (ROI). In order to provide accurate detail from objects behind the distorting layer, a simple and efficient frame selection method is proposed to pick informative ROIs from only good-quality frames. We solve the space-variant distortion problem using region-based fusion based on the Dual Tree Complex Wavelet Transform (DT-CWT). We also propose an object alignment method for pre-processing the ROI since this can exhibit significant offsets and distortions between frames. Simple haze removal is used as the final step. We refer to this algorithm as CLEAR (for code please contact me) (Complex waveLEt fusion for Atmospheric tuRbulence). [PDF] [VIDEOS]

Atmospheric distorted videos of static scene

Mirage (256×256 pixels, 50 frames). Left: distorted sequence. Right: restored image. Download PNG

carheathaze

Download other distorted sequences and references [here].

Atmospheric distorted videos of moving object

Left: Distorted video. Right: Restored video. Download PNG

References

  • Atmospheric turbulence mitigation using complex wavelet-based fusion. N. Anantrasirichai, Alin Achim, Nick Kingsbury, and David Bull. IEEE Transactions on Image Processing. [PDF] [Sequences] [Code: please contact me]
  • Mitigating the effects of atmospheric distortion using DT-CWT fusion. N. Anantrasirichai, Alin Achim, David Bull, and Nick Kingsbury. In Proceedings of the IEEE International Conference on Image Processing (ICIP 2012). [PDF] [BibTeX]
  • Mitigating the effects of atmospheric distortion on video imagery : A review. University of Bristol, 2011. [PDF]
  • Mitigating the effects of atmospheric distortion. University of Bristol, 2012. [PDF]

High Frame Rate Video

As the demand for higher quality and more immersive video content increases, the need to extend the current video parameter space of spatial resolutions and display sizes, to include, among other things, a wider colour gamut, higher dynamic range and higher frame rates, becomes ever greater. The use of increased frame rate can provide a more realistic portrayal of a scene through a reduction in motion blur, while also minimizing temporal aliasing, and the associated visual artefacts.

The BVI-HFR video database is the first publicly available high frame rate video database, and contains 22 unique HD video sequences at frame rates up to 120 Hz. Sample frames from some of the video sequences can be seen below:

sparkler hamster catch flowers bobblehead cyclist

 

 

 

 

 

 

 

 

 

Subjective evaluations of 51 participants on the sequences in the BVI-HFR video database have shown a clear relationship between frame rate and perceived quality (MOS), although we do see the effect of diminishing returns. The results also showed that a degree of content dependency exists, for example benefits of higher frame rate material are more likely to be observed in video sequences with high motion speed (i.e. moving camera).

 

subtest

Publications

A STUDY OF SUBJECTIVE VIDEO QUALITY AT VARIOUS FRAME RATES, Mackin, A. and Zhang, F. and Bull, D., Image Processing (ICIP), 2015 22nd IEEE International Conference on, 2015.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

What’s on TV: A Large-Scale Quantitative Characterisation of Modern Broadcast Video Content

Video databases, used for benchmarking and evaluating the performance of new video technologies, should represent the full breadth of consumer video content. The parameterisation of video databases using low-level features has proven to be an effective way of quantifying the diversity within a database. However, without a comprehensive understanding of the importance and relative frequency and of these features in the content people actually consume, the utility of such information is limited. In collaboration with the BBC, the  “What’s on TV” is a large-scale analysis of the low-level features that exist in contemporary broadcast video. The project aims to establish an efficient set of features that can be used to characterise the spatial and temporal variation in modern consumer content. The meaning and relative significance of this feature set, together with the shape of their frequency distributions, represent highly valuable information for researchers wanting to model the diversity of modern consumer content in representative video databases.

Publications:

Felix Mercer Moss, Fan Zhang, Roland Baddeley and David Bull, What’s on TV: A large-scale quantitative characterisation of modern broadcast video content, ICIP 2016.

for_website