Future Generation Video Coding
High Efficiency Video Coding (HEVC)
The HEVC has been designed to address essentially all existing applications of H.264/MPEG-4 AVC and to particularly focus on two key issues: increased video resolution and increased use of parallel processing architectures. Figure 1 shows structure of HEVC video encoder. The HEVC video coding layer uses the same "hybrid" approach using inter-/intra-picture prediction and 2D transform coding used in all modern video standards.
Figure 1. Typical HEVC video encoder.
The HEVC is composed of three units: a) a coding unit (CU) for the root of the transform quadtree, as well as a prediction mode for the INTER/SKIP/INTRA prediction, b) a prediction units (PU) for coding the mode decision, including motion estimation and rate-distortion optimization, and c) a transform unit (TU) for transform coding and entropy coding. Initially, a frame is divided into a sequence of non-overlapping largest coding units, called a Coding Tree Unit (CTU). Figure 2 and 3 are shown divided frame with high resolution of 2560 x 1080 by a CTU in HEVC and macroblock in H.264/AVC, respectively. To compress high resolution video contents effectively, the HEVC employed larger block size than the H.264/AVC. A CTU can be recursively divided into smaller coding unit (CU) and made flexible using quadtree partitioning, which is called a Coding Tree Block (CTB).
Figure 2. Divided frame by a CTU in the HEVC for 2560x1080 resolution.
Figure 3. Divided frame by macroblock in the H.264/AVC for 2560x1080 resolution
The HEVC has recursive quad-tree block structure greatly increases the computational complexity, while coding efficiency is increased using by larger CTU sizes. Figure 4 shows performance comparison between the HEVC and AVC in Bit-rate and complexity. The HEVC has more computational complexity for 3 times than H.264/AVC, while average bit-rate is decreased about 50% than H.264/AVC. After the HEVC standard was completed, JCT-VC are developing extensions including range extensions, scalable coding, multi-view coding, 3D video.
Figure 4. Performance comparison between the HEVC and H.264/AVC.
The HEVC is designed to cover a broad range of applications for video content including but not limited to the following:
- Range extensions (supporting enhanced video formats).
- Scalable coding extensions.
- Multi-view extensions.
- 3D video extensions.
Research area of the HEVC standards in our Lab.
In our Lab., To solve problems that is mentioned above and support enhanced quality, research areas are fast video coding algorithm development in order to design real-time video encoding system and quality improvement algorithm.
- Broadcast(cable TV on optical networks/copper, satellite, terrestrial, etc.)
- Content production and distribution
- Digital cinema
- Home cinema
- Internet streaming, download and play
- Medical imaging
- Mobile streaming, broadcast and communications
- Real-time conversational services (videoconferencing, videophone, telepresence, etc.)
- Remote video surveillance
- Storage media (optical disks, digital video tape recorder, etc.)
- Wireless display
3D Video Coding Algorithms
- Mode decision
- Fast intra mode decision
- Fast inter mode decision
- Motion estimation
- Fast pattern search
- Quality improvement
3D Video is a standard that targets serving a variety of 3D displays. It is the first phase of FTV (free-viewpoint TV), which is a new framework that includes a coded representation for multiview video and depth information to support the generation of high-quality intermediate views at the receiver. This enables free viewpoint functionality and view generation for auto-multiscopic displays.
Figure 1 shows an example of an FTV system that transmits multiview video with depth information. The content may be produced in a number of ways, e.g., with multi-camera setup, depth cameras or 2D/3D conversion processes. At the receiver, depth-image-based rendering could be performed to project the signal to various types of displays.
Figure 1 shows an example of an FTV
This means video for 3D displays. Such displays here in focus present N views
(e.g. N = 9) simultaneously to the user (see Figure 2). For efficiency reasons
only a lower number K of views (K = 1,..,3) shall be transmitted. For those K
views additional depth data shall be provided. At the receiver side the N views
to be displayed are generated from the K transmitted views with depth by depth
image based rendering (DIBR). This is illustrated in Figure 2.
Figure 2. Example of
generating 9 outputs views (N = 9) out of 3 input views with depth (K = 3)
This application scenario imposes specific constraints such as narrow angle acquisition (< 20 degrees). Also there should be no need (cost reasons) for geometric rectification at the receiver side, meaning if any rectification is needed at all it should be performed on the input views already at the encoder side.
Some multiview displays are for example based on an LCD screens with a sheet of transparent lenses in front of it. This sheet sends different views to each eye, and so a person sees two different views, and thus enabling the person a stereoscopic viewing experience. The stereoscopic capabilities of these multiview displays are limited by the resolution of the LCD screen (currently 1920*1080). For example for a 9 view system where the cone of 9 views is 10 degrees (cone angle CA), objects are limited to +/-10% (object range OR) of the screen width to appear in front or behind the screen. Both OR and also CA will improve in time (determined by economics) as the number of pixels of the LCD screen goes up.
Figure 3. Example of lenticular auto-stereoscopic display requiring 9 views (N = 9)span>
Also other types of stereo displays appear now in large number on the market. The ability to generate output views at arbitrary positions at the receiver, is even attractive in the case of N = 2 (i.e. simple stereo display). If for example the material has been produced for a large cinema theater, direct usage of that stereo signal (2 fixed views) with relatively small home size 3D displays will yield a very different stereoscopic viewing experience (eg. strongly reduced depth effect). With a 3DV signal as illustrated in Figure 3, a new stereo pair can be generated which is optimized for the given 3D display.