Image and Video Coding: Video Coding Extensions Screen Content - - PowerPoint PPT Presentation

▶

Feb 17, 2024 225 likes •578 views

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding sensor-captured video content screen content video Screen Content Video Increasingly becoming important for a number of applications (e.g., online

SLIDE 1

Image and Video Coding: Video Coding Extensions

SLIDE 2

Screen Content Coding

sensor-captured video content screen content video

Screen Content Video Increasingly becoming important for a number of applications (e.g., online meetings) Screen content video sequences have different properties than sensor-captured video sequences Coding efficiency could be improved by dedicated coding tools / coding modes

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 2 / 34

SLIDE 3

Screen Content Coding / Coding Tools

Transform Skip Mode

DCT-II Quant. Dequant. IDCT-II

Transform Coding Efficiency for Screen Content Less energy compaction as for typical sensor-captured content Strong quantization can result in disturbing artefacts Transform Skip Mode Coding mode for which no transform is carried out (indicated by special flag) Direct quantization of residual samples Can be combined with dedicated entropy coding for quantization indexes

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 3 / 34

SLIDE 4

Screen Content Coding / Coding Tools

Block Differential Pulse Code Modulation (BDPCM)

quantization prediction

vertical BDPCM ˆ q[x, y] = q[x, y −1] horizontal BDPCM ˆ q[x, y] = q[x −1, y] no BDPCM ˆ q[x, y] = 0

Exploit Dependencies in Transform Skip Mode Quantization indexes are not directly transmitted by entropy coding Two additional modes for prediction of quantization indexes (inside block):

Horizontal prediction (first column is not predicted) Vertical prediction (first row is not predicted)

Entropy coding of prediction errors ∆q = q[x, y] − ˆ q[x, y]

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 4 / 34

SLIDE 5

Screen Content Coding / Coding Tools

Intra Block Copy

curr curr curr curr invalid 64×64 region valid 64×64 region restrictions in VVC

“Motion-compensated” prediction inside a picture with integer-sample accurate motion vectors To reduce memory access complexity, VVC includes restrictions of permitted motion vectors

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 5 / 34

SLIDE 6

Screen Content Coding / Coding Tools

Palette Mode

1 2 3 4 x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 escape

G|Y B|Cb R|Cr

palette 0 0 0 0 0 0 0 0 0 3 3 3 4 2 2 2 0 3 3 1 1 2 2 2 0 3 3 1 1 2 2 2 0 0 0 1 1 2 2 2 0 1 1 1 1 1 1 4 0 1 0 1 1 1 1 2 0 0 0 0 0 0 0 0 1 1 1 1 1 1

run = 5

index = 1, run = 5 0 3 3 1 1 2 2 2 0 3 3 1 1 2 2 2

run = 7

copy above, run = 7

Alternative Coding Mode: Palette Mode Quantized color vectors are represented by palette indexes

Palette for current block is predictively coded referring to preceding palettes Palette can include an escape symbol for representing less likely values

Palette indexes are coded using horizontal or vertical scanning, using two coding modes

1 Index mode:

Transmit palette index and run length (≥ 0)

2 Copy mode:

Index is copied from top (hor. scan) or left (ver. scan), transmit run length (≥ 0)

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 6 / 34

SLIDE 7

Screen Content Coding / Coding Efficiency

Coding Efficiency Example: ”Desktop” (1920 x 1080)

0.5 1 1.5 2 2.5 3 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

VVC without screen content tools VVC with additional screen content tools bit rate [Mbits/s] PSNR [dB]

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 7 / 34

SLIDE 8

Screen Content Coding / Coding Efficiency

Subjective Comparison: “Desktop” (Crop of Top-Left Region)

VVC without SCC tools @ 1 Mbit/s VVC with SCC tools @ 1 Mbit/s

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 8 / 34

SLIDE 9

Screen Content Coding / Coding Efficiency

Coding Efficiency Impact of Screen Content Coding Tools (Example: VVC)

average bit-rate savings intra only random access low delay ChineseEditing 38 % 36 % 32 % Console 66 % 52 % 48 % Desktop 67 % 61 % 57 % FlyingGraphics 41 % 18 % 14 % SlideEditing 47 % 44 % 36 % SlideShow 20 % 16 % 10 % average 46 % 38 % 33 %

Average Bit Rate Savings Bit-rate savings based on PSNR as quality measure Averages over reasonable quality range Screen content tools provide large gains for many sequences

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 9 / 34

SLIDE 10

Scalable Video Coding / Types of Scalability

Scalable Video Coding

riginal: 1080p, 60Hz

video encoder video decoder

1080p, 60Hz, 10 MBits/s bitstream

video decoder

1080p, 60Hz, 5 MBits/s

video decoder

720p, 30Hz, 1.5 MBits/s

Scalable Bitstream Includes multiple coded versions

f a video sequence

Representations must be extractable by simple discarding of packets Decoder or middlebox can extract representation suitable for application requirements

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 10 / 34

SLIDE 11

Scalable Video Coding / Types of Scalability

Types of Scalability

Temporal Scalability Scalable bitstream contains representations with different frame rates Spatial Scalability Scalable bitstream contains representations with different spatial resolutions Quality Scalability Scalable bitstream contains representations with different bit rates (but same resolution) Combined Scalability Combination of two or more of the above types

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 11 / 34

SLIDE 12

Scalable Video Coding / Temporal Scalability

Temporal Scalability

I B 3 B 2 B 4 6 B 1 base layer B 4 B 5 B 7 B 8 additional enhancement layer pictures

Coding Structures for Temporal Scalability Requirement: Enhancement layer picture are not used for prediction of base layer pictures Hierarchical B picture are well suited and provide very high coding efficiency Very small loss in coding efficiency relative to best possible single layer coding

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 12 / 34

SLIDE 13

Scalable Video Coding / Quality and Spatial Scalability

Quality / SNR Scalability

I B 1 B 9 B 7 B 11 B 5 B 15 B 13 B 17 B 3 enhancement layer I B 8 B 6 B 10 B 4 B 14 B 12 B 16 B 2 base layer

Inter-Layer Prediction Add co-located base layer picture to reference list of enhancement layer picture Base layer data are exploited by sample prediction and motion prediction Improves coding efficiency relative to independent coding of both layers (simulcast)

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 13 / 34

SLIDE 14

Scalable Video Coding / Quality and Spatial Scalability

Spatial Scalability

I B 1 B 9 B 7 B 11 B 5 B 15 B 13 B 17 B 3 enhancement layer

upsampler upsampler upsampler upsampler upsampler upsampler upsampler upsampler upsampler

I B 8 B 6 B 10 B 4 B 14 B 12 B 16 B 2 base layer

Inter-Layer Prediction with Upsampling Add upsampled co-located base layer picture to reference list of enhancement layer picture Use information coded in base layer for improving coding efficiency relative to simulcast

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 14 / 34

SLIDE 15

Scalable Video Coding / Quality and Spatial Scalability

Multi-Layer and Combined Scalability

B 2 B 13 B 11 B 15 B 8 B 20 B 18 B 22 B 5

layer 2

B 1 B 12 B 10 B 14 B 7 B 19 B 17 B 21 B 4

layer 1

I B 9 B 6 B 16 B 3

layer 0

Multiple quality and/or spatial enhancement layers are possible

Coding efficiency for top layer decreases with number of supported layers Decoding complexity for top layer increases with number of supported layers

Temporal scalability can be straightforwardly combined with quality/spatial scalability

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 15 / 34

SLIDE 16

Multiview and 3D Video Coding / Stereo and Multiview Coding

3D Cinema / Home Cinema: Stereo Video

display positive parallax

Why Glasses ? Need to project different image to each eye Glasses control over what each eye sees Need to transit video with two images per time instance

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 16 / 34

SLIDE 17

Multiview and 3D Video Coding / Stereo and Multiview Coding

Stereo Video Example

Similarities between left and right picture for same time instance Can be exploited by technique similar to motion-compensated prediction

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 17 / 34

SLIDE 18

Multiview and 3D Video Coding / Stereo and Multiview Coding

Multi-view Coding with Disparity-Compensated Prediction

5 4 6 2 I B B B B

left view (primary)

1 6 5 7 3 B B B B B

right view (secondary)

Multiview Coding with Disparity-Compensated Prediction Add reconstructed picture of primary view to reference lists for secondary view (same time instance) Only change required is construction of reference picture lists Straightforward extension to more than 2 views

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 18 / 34

SLIDE 19

Multiview and 3D Video Coding / 3D Coding

Autostereoscopic Displays

[ J. Geng, Three-dimensional display technologies, 2013 ]

Need to provide very large number of views (> 50) Problem for video acquisition and transmission

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 19 / 34

SLIDE 20

Multiview and 3D Video Coding / 3D Coding

Disparity and Object Distance

bject distance z

baseline b focal length f disparity d disparity d = f · b z

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 20 / 34

SLIDE 21

Multiview and 3D Video Coding / 3D Coding

Depth Maps and Rendering of Virtual View

Depth Image Based Rendering Depth maps provide information about object distance for each sample in a picture Virtual views can be generated at receiver side by depth image based rendering using

One or multiple views (preferably multiple view due to occlusions) Associated depth maps

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 21 / 34

SLIDE 22

Multiview and 3D Video Coding / 3D Coding

3D Video Coding: Transmission of Multiple Views with Depth Maps

Conventional multiview coding (with disparity-compensated prediction) for textures and depth maps Potential improvements:

Dedicated coding tools for depth map coding (characterized by sharp edges, low details) Exploitation of texture data for depth coding (or vice versa)

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 22 / 34

SLIDE 23

Virtual Reality: 360 Degree Video

Virtual Reality (VR) / 360◦ Video

conventional video coding

panorama stitching frame packing video encoder video decoder viewport rendering

head movement

mnidirectional

capture

Virtual Reality: Coding of 360◦ Video Panorama stitching: Combine multiple videos into single 360◦ panoramic video Frame packing: Project 3D representation into conventional video frames Require suitable projection formats Video coding: Conventional coding of 2D video frames (need very large resolution !) Coding efficiency depends on chosen projection format Viewport rendering: Rendering of viewport (e.g., 75◦ viewing angle) given projection format Considering head movement in real-time

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 23 / 34

SLIDE 24

Virtual Reality: 360 Degree Video / Image Stitching

Panorama Stiching

[ source: Wikipedia ]

Transform images into common coordinate system (compositing surface, projection format) Seemless blending of overlapping parts Stitching issues: Parallax, lens distortion, motion in scene, camera calibration, exposure, ...

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 24 / 34

SLIDE 25

Virtual Reality: 360 Degree Video / Image Stitching

Example: Possible Stitching Artifacts

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 25 / 34

SLIDE 26

Virtual Reality: 360 Degree Video / Projection Formats

Projection onto 2D Video Frames

Representation of 360◦ Video Video samples for dense set of

latitude angles θ ∈ [−π; π] and longitude angles φ ∈ [−π/2; π/2]

Represent as 2D arrays of samples Projection Formats Use a virtual object in 3D space (e.g., sphere, cube) Project captured video samples on surface of object Pack surface samples into 2D array (video frame) Impact of Projection Formats Chosen format impacts quality of viewport rendering Chosen format impacts efficiency of video coding P

θ φ

Y X Z X = cos(θ) · cos(φ) Y = sin(θ) Z = − cos(θ) · sin(φ)

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 26 / 34

SLIDE 27

Virtual Reality: 360 Degree Video / Projection Formats

360◦ Projection Formats: Equirectangular Projection (ERP)

Project surface of sphere into a rectangular picture: x = αφ and y = αθ (α specifies resolution) Non-uniform sampling, strong geometric distortions (in particular at the poles) Camera and object motion difficult to represent in coded video

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 27 / 34

SLIDE 28

Virtual Reality: 360 Degree Video / Projection Formats

360◦ Projection Formats: Segmented Sphere Projection (SSP)

top bottom

Latitude angles θ ∈ [−π/2; π/2] are projected as in equirectangular projection Top and bottom parts of sphere surface are represented as additional circles Reduced geometric distortion for pole regions, still similar problems as in equirectangular projection

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 28 / 34

SLIDE 29

Virtual Reality: 360 Degree Video / Projection Formats

360◦ Projection Formats: Octahedron Projection (OHP)

8 triangular faces of regular octahedron are arranged in rectangular picture Small geometric distortion inside faces Complicated motion across face boundaries Diagonal borders at top and bottom are unsuitable for video coding

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 29 / 34

SLIDE 30

Virtual Reality: 360 Degree Video / Projection Formats

360◦ Projection Formats: Truncated Square Pyradmid Projection (TSP)

Project 6 faces of truncated square pyramid into rectangular picture Front face has same resolution as combination of remaining 5 faces No borders in composed picture, typical geometric motion artifacts at face boundaries

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 30 / 34

SLIDE 31

Virtual Reality: 360 Degree Video / Projection Formats

360◦ Projection Formats: Cubemap Projection (CMP)

left front right bottom back top

Project 6 faces of cube into rectangular picture (two connected regions of 3 faces) Multiple versions that differ in projection geometry (slightly distorted faces) Small geometric distortion inside faces, typical geometric motion artifacts at face boundaries

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 31 / 34

SLIDE 32

Virtual Reality: 360 Degree Video / Viewport Rendering

Attenuation of Seem Artifacts in Viewport Rendered from Decoded Video

guard bands for equirectangular projection left front right bottom back top guard bands for cubemap projection

Modified Projection Formats: Guard Bands Extend faces of 3D body (regions at boundaries are included multiple times) Reduction of coding artifacts at seem boundaries Additional samples for interpolation filters and seem blending

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 32 / 34

SLIDE 33

Virtual Reality: 360 Degree Video / Viewport Rendering

Dynamic Viewport Rendering

Y X Z left front right bottom back top

Rendering of Viewport

Map viewport coordinates to world coordinates XYZ Rotate XYZ according to head movement Map XYZ to coordinates of projection format Generate viewport sample by interpolation and blending

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 33 / 34

SLIDE 34

Summary

Summary of Lecture

Screen Content Coding Dedicated coding tools: Transform skip, BDPCM, intra block copy, palette mode Improved coding efficiency for typical screen content pictures and videos Scalable Video Coding Hierarchical B pictures suitable for providing temporal scalability Quality and spatial scalability: Layered coding with inter-layer prediction Multiview and 3D Video Coding Multiview coding with inter-view prediction (similar to quality scalability) 3D video: Multiview coding of texture views and associated depth maps Virtual Reality: 360◦ Video Coding Projection of 360◦ video into conventional video pictures Conventional coding of resulting video pictures: Efficiency depends on projection format

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 34 / 34