Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based - PowerPoint PPT Presentation

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction Somdyuti Paul, Andrey Norkin and Alan C. Bovik AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 1 / 19

Outline Introduction Prediction Performance 1 7 Related Work Inconsistency Correction 2 8 Overview of Approach Visualizing Superblock Partitions 3 9 10 Encoding Performance Database Creation 4 11 Concluding Remarks H-FCN Model Architecture 5 12 References H-FCN Training 6 AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 2 / 19

Introduction In VP9, 64 × 64 superblocks are partitioned recursively, possibly down to 4 × 4 blocks at four hierarchical levels. The rate-distortion optimization (RDO) based partition decision is a slow process owing to the combinatorial complexity of the partition search space. Figure 1: Hierarchical superblock partition at four levels. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 3 / 19

Related Work Several machine learning (ML) based approaches with custom feature design attempted to reduce the computational overhead of the partition search in HEVC [1], VP9 [2] and VVC [3]. Fewer works use deep learning based methods to solve the problem for HEVC [4, 5, 6]. A parallel convolutional neural network architecture was employed in [4] to achieve a speedup of 61.8% for a 2.25% increase in BD-rate in the intra mode of HEVC. A multi stage ML-framework was used to sequentially make block partition decisions in [2], achieving a speedup of 60.1% over the speed 0 setting of the VP9 encoder with 0.07% increase in BD-rate. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 4 / 19

Overview of Approach Our approach involves a bottom-up block merge prediction using a hierarchical fully convolutional neural network (H-FCN) [7] . Figure 2: VP9 partition prediction approach. implementation available at https://github.com/Somdyuti2/H-FCN.git AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 5 / 19

Database Creation Content Selection The content for our database comprises 89 movies and 17 television episodes, which were selected from video sources in the Netflix catalog. Each video content was encoded at three different resolutions (1080p, 720p and 540p) using the reference VP9 encoder from the libvpx package. The contents were encoded in VP9 Profile 0, using speed level 1 and the good quality configuration. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 6 / 19

Database Creation Partition Tree Representation A concise description of the partition tree was required for effective learning. The partition tree was represented in the form of a set of four matrices: Figure 3: Matrix representation of the four level partition tree. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 7 / 19

Database Creation The reference VP9 decoder from the libvpx package was modified to extract the superblock partition trees and the corresponding quantization parameter (QP) values from the encoded bitstreams. The raw pixel data for each superblock was obtained by extracting the luma channels of non-overlapping 64 × 64 blocks from the source videos downsampled to the encode resolution. Our database encompasses internal QP values in the range 8-105. Table 1: Summary of VP9 intra-mode superblock partition database Database Contents % of CGI content # of samples Training 62 (M) + 12 (E) 12.16 11 990 384 Validation 27 (M) + 5 (E) 12.50 4 698 195 AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 8 / 19

H-FCN Model Architecture Figure 4: Architecture of H-FCN model having 26 336 parameters and 54 610 FLOPs. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 9 / 19

H-FCN Training Categorical cross entropy loss N K L q ( w ) = − 1 y i,j log ( p q � � i,j ( w )) q = 1 , · · · , 85 ( N = 128 , K = 4) N i =1 j =1 Figure 5: H-FCN loss with training progress. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 10 / 19

Prediction Performance The prediction accuracy at each level was evaluated on 10 5 randomly drawn samples from the training and validation sets. Table 2: Prediction accuracy of H-FCN model Level # Training (%) Validation (%) 0 89.42 90.27 1 84.42 83.47 2 86.07 85.13 3 91.73 91.18 AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 11 / 19

Inconsistency Correction At each level, the model predictions are made independently of all other levels. Possible inconsistencies between the predictions of any two levels are corrected by a top-down approach. Figure 6: Top-down inconsistency correction. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 12 / 19

Visualizing Superblock Partitions (a) QP=25 (b) QP=36 (c) QP=42 (d) QP=63 Figure 7: Superblock partitions predicted by the trained H-FCN model compared with ground truth . AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 13 / 19

Encoding Performance The trained model was integrated with the reference VP9 encoder using the Tensorflow C API. The predicted partitions were ordered to form a preorder traversal of the partition tree, and subsequently used to replace the RDO based partition decision in a recursive fashion. The encoding performance was evaluated on 30 test sequences at 3 resolutions in terms of both BD-rate and speedup ( ∆ T ). Table 3: Encoding perfomance with respect to RDO baseline Resolution ∆ T (%) BD-rate (%) 1080p 67.5 1.70 720p 72.2 1.75 540p 69.5 1.68 Overall 69.7 1.71 AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 14 / 19

Encoding Performance Comparison with Speed Level 4 of Reference Encoder The speedup and BD-rate of our approach was also compared with speed level 4 of the reference VP9 encoder, the highest recommended speed level for the baseline configuration. Table 4: Comparison of speedup versus BD-rate tradeoff of our approach with VP9 speed level 4 ∆ T (%) BD-rate (%) Resolution Speed 4 H-FCN Speed 4 H-FCN 1080p 62.0 67.5 2.95 1.70 720p 68.2 72.2 4.12 1.75 540p 65.9 69.5 2.38 1.69 Overall 65.4 69.7 3.15 1.71 AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 15 / 19

Encoding Performance Comparison with Speed Level 4 of Reference Encoder The benefit offered by our approach in terms of speedup persists across the range of QP values used to learn the H-FCN model. Figure 8: Speedup achieved by H-FCN and RDO at speed 4 relative to baseline. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 16 / 19

Concluding Remarks Our H-FCN based partition prediction approach achieved 69.7% speedup on average at the expense of 1.71% increase in BD-rate. It achieves 4.3% higher speed up than the speed level 4 of the reference encoder, while incurring 1.44% smaller BD-rate penalty. Further benefits can possibly be derived by extending the approach to the AV1 codec. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 17 / 19

References [1] D. Ruiz-Coll, V. Adzic, G. Fernandez-Escribano, H. Kalva, J. Martinez, and P. Cuenca, “Fast partitioning algorithm for HEVC intra frame coding using machine learning,” in Proc. IEEE Int. Conf. Image Process. , pp. 4112–4116, 2014. [2] H. Su, C. Tsai, Y. Wang, and Y. Xu, “Machine learning accelerated partition search for video encoding,” in Proc. IEEE Int. Conf. Image Process. , pp. 2661–2665, 2019. [3] T. Amestoy, A. Mercat, W. Hamidouche, D. Menard, and C. Bergeron, “Tunable VVC frame partitioning based on lightweight machine learning,” IEEE Trans. Image Process. , 2019. [4] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, “Reducing complexity of HEVC: A deep learning approach,” IEEE Trans. Image Process. , vol. 27, pp. 5044–5059, Oct. 2018. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 18 / 19

References [5] Z. Liu, X. Yu, Y. Gao, S. Chen, X. Ji, and D. Wang, “CU partition mode decision for HEVC hardwired intra encoder using convolution neural network,” IEEE Trans. Image Process. , vol. 25, pp. 5088–5103, Nov. 2016. [6] K. Kim and W. Ro, “Fast CU depth decision for HEVC using neural networks,” IEEE Trans. Circuits Syst. Video Technol. , vol. 29, pp. 1462–1473, May 2018. [7] S. Paul, A. Norkin, and A. Bovik, “Speeding up VP9 intra encoder with hierarchical deep learning based partition prediction,” arXiv preprint arXiv:1906.06476 , 2019. AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 19 / 19

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based - PowerPoint PPT Presentation

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction Somdyuti Paul, Andrey Norkin and Alan C. Bovik AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 1 / 19 Outline Introduction

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

African Trade Champions African Trade Champions (INTRA-CHAMPS) (INTRA-CHAMPS) Statement by:

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

dav1d, 1 year later Jean-Baptiste Kempf 0202-2020 Who am I? President of VideoLAN Work/Manage

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Spatial Data Structures Hierarchical Bounding Volumes Hierarchical Bounding Volumes Grids Grids

CABI Complicated intra-ABdominal Infection in the UK Background Complicated intra-abdominal

Intra-Day Trading Oct 3 rd 2011 Workshop Intra-Day Trading Continuous implicit trading;

Intra-African Trade Imperative Statement by: Gainmore Zanamwe Senior Manager Intra-African

Measuring Intra-household Inequality KCP Project: Intra-Household Allocation of and Gender

Holonomic D -Modules, the Dixmie Conjecture and the Jacobian Conjecture Vladimir Bavula

Homological mirror symmetry HMS (Kontsevich 1994, Hori-Vafa 2000, Kapustin-Li 2002, Katzarkov

Model Theory for Sheaves of Modules Mike Prest School of Mathematics, University of Manchester,

Reversible sequences of natural numbers and reversibility of some disconnected binary structures

Producing Effective Interpolants for SAT-based Incremental Verification and Upgrade Checking

Super-FRS components Presenter: F. Amjad Contributors: H. Weick, C. Karagiannis, C. Schloer, N.

Inspi Inspiring ring U How to Bring Your "A-Game" To Work Every Day and Bring out the

Game Engines CMPM 164, F2019 Prof. Angus Forbes (instructor) angus@ucsc.edu Montana Fowler (TA)

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based - PowerPoint PPT Presentation

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction Somdyuti Paul, Andrey Norkin and Alan C. Bovik AOM Symposium 2019 VP9 Partition Prediction Using H-FCN October 21, 2019 1 / 19 Outline Introduction

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

African Trade Champions African Trade Champions (INTRA-CHAMPS) (INTRA-CHAMPS) Statement by:

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

dav1d, 1 year later Jean-Baptiste Kempf 0202-2020 Who am I? President of VideoLAN Work/Manage

Image and Video Coding: Intra Prediction &amp; Picture Partitioning Intra-Picture Prediction

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Spatial Data Structures Hierarchical Bounding Volumes Hierarchical Bounding Volumes Grids Grids

CABI Complicated intra-ABdominal Infection in the UK Background Complicated intra-abdominal

Intra-Day Trading Oct 3 rd 2011 Workshop Intra-Day Trading Continuous implicit trading;

Intra-African Trade Imperative Statement by: Gainmore Zanamwe Senior Manager Intra-African

Measuring Intra-household Inequality KCP Project: Intra-Household Allocation of and Gender

Holonomic D -Modules, the Dixmie Conjecture and the Jacobian Conjecture Vladimir Bavula

Homological mirror symmetry HMS (Kontsevich 1994, Hori-Vafa 2000, Kapustin-Li 2002, Katzarkov

Model Theory for Sheaves of Modules Mike Prest School of Mathematics, University of Manchester,

Reversible sequences of natural numbers and reversibility of some disconnected binary structures

Producing Effective Interpolants for SAT-based Incremental Verification and Upgrade Checking

Super-FRS components Presenter: F. Amjad Contributors: H. Weick, C. Karagiannis, C. Schloer, N.

Inspi Inspiring ring U How to Bring Your &quot;A-Game&quot; To Work Every Day and Bring out the

Game Engines CMPM 164, F2019 Prof. Angus Forbes (instructor) angus@ucsc.edu Montana Fowler (TA)

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

Inspi Inspiring ring U How to Bring Your "A-Game" To Work Every Day and Bring out the