Adaptive Resolution Intra Coding

Delivering high resolution video in restricted bandwidth scenarios can be challenging.  Part of the reason for this is the high bitrate requirement of the intra-coded Instantaneous Decoding Refresh (IDR) pictures featuring in all video coding standards. Frequent coding of IDR frames is essential for error resilience in order to prevent the occurrence of error propagation. However, as each one consumes a huge portion of the available bitrate, the quality of future coded frames is hindered by high levels of compression. This work looks at new adaptive resolution intra coding methods for improving the rate distortion performance of the video codec.

 

B. Hosking, D. Agrafiotis, and D. Bull, “Spatial resampling of IDR frames for low bitrate video coding with HEVC,” IS&T/SPIE Electronic Imaging, San Francisco, Feb 2015.

B. Hosking, D. Agrafiotis, D. Bull, and N. Easton, “AN ADAPTIVE RESOLUTION RATE CONTROL METHOD FOR INTRA CODING IN HEVC,” IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, March 2016.

 

AIC

 

Effect of resampled coding of IDR frame to the quality of a reconstructed B frame in a Group of Pictures (similar bitrate): left resampled coded , middle original, right standard coded

Visual Attention Based Video Compression

Accurate prediction of the viewer’s gaze location in a video frame has the potential to improve bit allocation, rate control, error resilience and quality evaluation in video compression. With complex contexts, such as that of broadcast football video, the potential reward is even higher given that compression and transmission of this type of content is challenging. We have developed a gaze location (visual attention) prediction system for high definition broadcast football video. The system employs Bayesian integration of bottom-up features and context specific top-down cues. The context is classified into different categories through shot classification thus allowing our model to pre-learn the task pertinence of each object category and build the top-down prior map automatically.

Q. Cheng, D. Agrafiotis, A. M. Achim, D. R. Bull, “Gaze Location Prediction for Broadcast Football Video”, IEEE Transactions on Image Processing, vol 22, no. 12, pp. 4918-4929, 2013

 

p-600x450

Perceptual Quality Metrics (PVM)

RESEARCHERS

Dr. Fan (Aaron) Zhang

INVESTIGATOR

Prof. David Bull, Dr. Dimitris Agrafiotis and Dr. Roland Baddeley

DATES

2012-2015

FUNDING

ORSAS and EPSRC

SOURCE CODE 

PVM Matlab code Download.

INTRODUCTION

It is known that the human visual system (HVS) employs independent processes (distortion detection and artefact perception – also often referred to near-threshold supra-threshold distortion perception) to assess video quality for various distortion levels. Visual masking effects also play an important role in video distortion perception, especially within spatial and temporal textures.

Algorithmic diagram for PVM.
It is well known that small differences in textured content can be tolerated by the HVS. In this work, we employ the dual-tree complex wavelet transform (DT-CWT) in conjunction with motion analysis to characterise this tolerance within spatial and temporal textures. The DT-CWT has been found to be particularly powerful in this context due to its shift invariance and orientation selectivity properties. In highly distorted video content, for compressed material, blurring is one of the most commonly occuring artefacts. This is detected in our approach by comparing high frequency subband coefficients from the reference and distorted frames, also facilitated by the DT-CWT. This is motion-weighted in order to simulate the tolerance of the HVS to blurring in content with high temporal activity. Inspired by the previous work of Chandler and Hemamiand Larson and Chandler, thresholded differences (defined as noticeable distortion) and blurring artefacts are non-linearly combined using a modified geometric mean model, in which the proportion of each component is adaptively tuned. The performance of the proposed video metric is assessed and validated using the VQEG FRTV Phase I and the LIVE video databases, and shows clear improvements in correlation with subjective scores, over existing metrics such as PSNR, SSIM, VIF, VSNR, VQM and MOVIE, and in many cases over STMAD.

RESULTS

Figure: Scatter plots of subjective DMOS versus different video metrics on the VQEG database.
Figure: Scatter plots of subjective DMOS versus different video metrics on the LIVE video database.

REFERENCE

  1. A Perception-based Hybrid Model for Video Quality Assessment F. Zhang and D. Bull, IEEE T-CSVT, June 2016.
  2. Quality Assessment Methods for Perceptual Video Compression F. Zhang and D. Bull, ICIP, Melbourne, Australia, September 2013.

 

Parametric Video Coding

RESEARCHERS

Dr. Fan (Aaron) Zhang

INVESTIGATOR

Prof. David Bull, Dr. Dimitris Agrafiotis and Dr. Roland Baddeley

DATES

2008-2015

FUNDING

ORSAS and EPSRC

INTRODUCTION

In most cases, the target of video compression is to provide good subjective quality rather than to simply produce the most similar pictures to the originals. Based on this assumption, it is possible to conceive of a compression scheme where an analysis/synthesis framework is employed rather than the conventional energy minimization approach. If such a scheme were practical, it could offer lower bitrates through reduced residual and motion vector coding, using a parametric approach to describe texture warping and/or synthesis.

methodDiagram-1200x466

Instead of encoding whole images or prediction residuals after translational motion estimation, our algorithm employs a perspective motion model to warp static textures and utilises texture synthesis to create dynamic textures. Texture regions are segmented using features derived from the complex wavelet transform and further classified according to their spatial and temporal characteristics. Moreover, a compatible artefact-based video metric (AVM) is proposed with which to evaluate the quality of the reconstructed video. This is also employed in-loop to prevent warping and synthesis artefacts. The proposed algorithm has been integrated into an H.264 video coding framework. The results show significant bitrate savings, of up to 60% compared with H.264 at the same objective quality (based on AVM) and subjective scores.

RESULTS

 

 

REFERENCE

  1. Perception-oriented Video Coding based on Image Analysis and Completion: a Review. P. Ndjiki-Nya, D. Doshkov, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand, Signal Processing: Image Communication, July 2012.
  2. A Parametric Framework For Video Compression Using Region-based Texture Models. F. Zhang and D. Bull, IEEE J-STSP, November 2011.

Marie Skłodowska-Curie Actions : PROVISION

Creating a ‘Visually’ Better TomorrowPROVISION team photo

PROVISION is a network of leading academic and industrial organisations in Europe comprising of international researchers working on the problems plaguing most video coding technologies of the day. The ultimate goal is to make noteworthy technical advances and further improvements to the existing state-of-the-art techniques of compression video material.

The project shall not only aim to enhance broadcast and on-demand video material, but also produce a new generation of scientists equipped with research and soft skills needed by industry, academia and society by large. In line with the principles laid down by Marie Skłodowska-Curie actions of the European Commission, PROVISION is a great example of an ensemble of researchers with varied geographical and academic backgrounds all channelling their joint effort towards creating a technologically, or more specifically a ‘visually’ better tomorrow

Provision website, Provision facebook page

Efficient image and video algorithms & architectures

The group is involved in a broad range of activities related to image and video coding at various bit rates, ranging from sub 20kb/s to broadcast rates including High Definition.

We are currently conducting research in the following topics:

  • Parametric Coding – a paradigm for next generation video coding
  • Modelling and coding for 3G-HDTV and beyond – preserving production values and increasing immersivity (through resolution and dynamic range)
  • Scalable Video Coding – a paradigm for codec based congestion management
  • Distributed video coding – shifting the complexity from the encoder(s) to the decoder(s)
  • Complexity reductions for HDTV and post processing.
  • Biologically and neurally inspired media capture/coding algorithms and architectures
  • Architectures and sampling approaches for persistent surveillance – analysis using spatio-temporal volumes
  • Eye tracking and saliency as a basis for context specific systems
  • Quality assessment methods and metrics

Early work in the Group developed the concept of Primitive Operator Signal Processing, which enabled the realisation of high performance, multiplier-free filter banks. This led to collaboration with Sony, enabling the first ASIC implementation of a sub-band video compression system for professional use. In EPSRC project GR/K25892 (architectural optimisation of video systems), world leading complexity results were achieved for wavelet and non-linear filterbank implementations.

International interest has been stimulated by our work on Matching Pursuits (MP) video coding which preserves the superior quality of MP for displaced-frame difference coding while offering dramatic complexity savings and efficient dictionaries. The Group has also demonstrated that long-term prediction is viable for real-time video coding; its simplex minimisation method offers up to 2dB improvement over single-frame methods with comparable complexity.

Following the Group’s success in reduced-complexity multiple-reference-frame motion estimation, interpolation-free sub-pixel motion estimation techniques were produced in ROAM4G (UIC 3CResearch) offering improvements up to 60% over competing methods. Also in ROAM4G, a novel mode-refinement algorithm was invented for video transcoding which reduces the complexity over full-search by up to 90%. Both works have generated patents which have been licensed to ProVision and STMicroelectronics respectively. Significant work on H.264 optimisation has been conducted in both ROAM4G and the EU FP6 WCAM project.

In 2002, region-of-interest coding was successfully extended to sign language (DTI). Using eyetracking to reveal viewing patterns, foveation models provided bit-rate reductions of 25 to 40% with no loss in perceived quality. This has led to a research programme with the BBC on sign language video coding for broadcasting.

In collaboration with the Metropolitan Police, VIGELANT (EPSRC 2003) produced a novel joint optimisation for rapidly deploying wireless-video camera systems incorporating both multi-view and radio-propagation constraints. With Heriot-Watt and BT, the Group has developed novel multi-view video algorithms which, for the first time, optimise the trade-off between compression and view synthesis (EPSRC).

Methods of synthesising high throughput video signal processing systems which provide joint optimisation of algorithm performance and implementation complexity have been developed using genetic algorithms. Using a constrained architectural style, results have been obtained for 2D filters, wavelet filterbanks and transforms such as DCT. In 2005 innovative work conducted in the Group has led to the development of the X-MatchPROvw lossless data compressor (BTG patent assignment) which at the time this was the fastest in its class.