h 264 avc standard h 264 avc standard
play

H.264/AVC Standard H.264/AVC Standard 1 History History - PowerPoint PPT Presentation

H.264/AVC Standard H.264/AVC Standard 1 History History Objectives: 50% bit rate savings compared to MPEG-2 High quality video at both low and high bit rates: 64kbps to 240Mbps Network-friendly: more error resilient


  1. Intra- -prediction prediction Intra � Motivation: intra-frames are natural images, so they exhibit strong spatial correlation � Macro-blocks in intra-coded frames are predicted based on previously-coded ones � Above and/or to the left of the current block � The macro-block may be divided into 16, 4x4 sub-blocks which are predicted in cascading fashion � 9 modes for 4x4 and 4 modes for 16x16 size 28

  2. Intra- -prediction (cont prediction (cont’ ’d) d) Intra Coder � Directional spatial prediction Input Video Control (9 types for luma, 4 chroma) Control Signal Data Q A B C D E F G H Transform/ Quant. I a b c d Scal./Quant. - Transf. coeffs J e f g h Decoder Scaling & Inv. K i j k l Split into Transform L m n o p Macroblocks M 16x16 pixels Entropy N Coding 0 O De-blocking 7 P Filter 2 Intra-frame 8 Prediction Output 4 3 Motion- Video 6 1 5 e.g., Mode 3: Compensation Signal Intra/Inter diagonal down/right prediction a, f, k, p are predicted by Motion (A + 2Q + I + 2) >> 2 Data Motion 29 Estimation

  3. Luma 4x x4 Intramodes 30

  4. Luma 4x x4 Intramodes � d = round (B/4 + C/2 + D/4) 31

  5. Luma 16x x16 Intramodes 32

  6. Intra4x4- -Prediction Ex. Prediction Ex. - - Vertical Vertical Intra4x4 33

  7. Intra4x4- -Prediction Ex. Prediction Ex.- - Horizontal Horizontal Intra4x4 34

  8. Intra4x4- -Prediction Ex. Prediction Ex.- - DC DC Intra4x4 35

  9. Intra4x4- -Prediction Ex. Prediction Ex.– – Diagonal Diagonal Intra4x4 Down- -Right Right Down 36

  10. Optimal Intra4x4 Mode Selection Optimal Intra4x4 Mode Selection � Select the mode with the best R-D tradeoff. � Full search method: Divide each MB into sixteen 4x4 blocks. For each 4x4 block: For each of the nine lntra_4x4 prediction modes: � Predict the current 4x4 block by the current mode. 37 G t di ti id l

  11. Intra_16x16 Prediction Intra_16x16 Prediction � Intra_16x16 prediction (4 modes) � Predict the entire 16 x 16 block � Suitable for smooth areas � Four modes: � 0: Vertical � 1: Horizontal � 2: DC � 3. Plane 38

  12. Optimal Intra16x16 Mode Selection Optimal Intra16x16 Mode Selection Full search method: � For each lntra_l6x16 prediction mode: Get prediction of the current MB. � Find the prediction residual. � Perform 2D 4-point Hadamard transform for each 4x4 block. � Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point � Hadamard transform to the 4x4 DC again. Cost estimation: Compute the absolute value of all the Hadamard transform � coefficients. end Find the mode with the smallest cost as the best Intra_16x16 prediction mode for this MB. Decision between Intra_4x4 and Intra_16x16: � Compare the costs of Intra_4x4 mode and Intra_16x16 mode to find the � best mode. 39

  13. Motion Estimation (ME) Motion Estimation (ME) For each block, find the best match in the previous frame (reference � frame) Upper-left corner of the block being encoded: (x0, y0) � Upper-left corner of the matched block in the reference frame: (x1, y1) � Motion vector (dx, dy): the offset of the two blocks: � (dx, dy) = (x1 – x0, y1 – y0) � (x0, y0) + (dx, dy) = (x1, y1) � Motion vector need to be sent to the decoder. � 40

  14. Motion Compensation (MC) Motion Compensation (MC) � Given reference frame and the motion vector, can obtain a prediction of the current frame � Prediction error: Difference between the current frame and the prediction. � The prediction error will be coded by DCT, quantization, and entropy coding. 41

  15. GOP, I, P, and B Frames GOP, I, P, and B Frames GOP: Group of pictures (frames). � I frames (Key frames): � Intra-coded frame, coded as a still image. Can be decoded directly. � Used for GOP head, or at scene changes. � I frames also improve the error resilience. � P frames: (Inter-coded frames) � Predication-based coding, based on previous frames. � 42

  16. GOP, I, P, and B Frames GOP, I, P, and B Frames � B frames: Bi-directional interpolated prediction frames � Predicted from both the previous frame and the next frame: more flexibilities -> better prediction. � Encoding order: 1 4 2 3 7 5 6 � Decoding order: 1 4 2 3 7 5 6 � Display order: 1 2 3 4 5 6 7 � Need more buffers � Need buffer manipulations to display the correct order. 43

  17. Block Matching Algorithms for ME Block Matching Algorithms for ME Each frame splits into 16x16 pel blocks (MB), motion estimation � will be done for each macro-block. Search windows (Maximum movement): w: typically 8, 16 or 32 � Defining a cost for finding the best match for each block in � previous frame Mean Absolute Error (MAE) or sum Absolute Difference (SAD) � Mean Square Error (MSE) � Sum of the Squared Error (SSE) � Motion vector (MV) calculation between current block and its � counterpart in previous frame Calculating macro block differences and sending it � 44

  18. Cost Function Cost Function � The best match is found by minimizing the SAD (sum Absolute Difference) function that is computed as: 16 , 16 ∑ = − − − SAD ( s , c ( m )) s [ x , y ] c [ x m , y m ] x y = = x 1 , y 1 � Where s being the original video signal and c being the coded video signal 45

  19. Motion Estimation in H.264 Motion Estimation in H.264 What is new? � � Variable Block size Motion Estimation, Can yield 15% bit rate savings � � Multiple reference frame Motion Estimation, � 5-20% bit rate savings � Sub Pixel Motion Estimation, � 20% bit rate savings over integer ME 46

  20. Search Window Search Window � Search Window (in previous frame) � Rectangle with the same coordinate as current block in current frame, extended by w pixels in each directions q+2w w p+2w q w w p w 47

  21. Cost Function Cost Function � The best match is found by minimizing the SAD (sum Absolute Difference) function that is computed as : 16 , 16 ∑ = − − − SAD ( s , c ( m )) s [ x , y ] c [ x m , y m ] x y = = x 1 , y 1 � Where s being the original video signal and c being the coded video signal 48

  22. Full Search Method Full Search Method Full Search � All candidates within search window � are examined (2w+1) 2 positions should be � examined Advantage: Good accuracy, Finds � best match Disadvantage: Large amount of � computation, (2w+1) 2 matches, 16x16 MAE for each match that is Impractical for real-time applications In order to avoid this complexity, we � should reduce search points so we have to use Fast Block Matching Algorithms 49

  23. Initial Search Point Prediction Initial Search Point Prediction A median predictor is used for defining the initial search point � That is the median value of the motion vectors of three spatially � adjacent blocks: left, top and top-right (top-left) of the current block. ( ) = mv _ pred ( pred _ x , pred _ y ) median mv _ A , mv _ B , mv _ C If C not exist then C=D � D B C If B, C not exist then prediction = MV_A � A If A, C not exist then prediction = MV_B � If A, B not exist then prediction = MV_C E � Otherwise Prediction = median(MV_A,MV_B,MV_C) � 50

  24. 2- -D Logarithmic Search (TDL) D Logarithmic Search (TDL) 2 Examine central point & its � four surroundings 2 1 Distance from center: w/2 � Find best match � If the best match is not in the � center examine three new 2 1 1 1 points centering previous best Half the distance, continue � 3 3 3 until the distance is 1, use all 9 matches, find best. Stop 3 2 3 1 Here the maximum search � points is: 2 + 7 log w 3 3 3 51

  25. Three Step Search (TSS) Three Step Search (TSS) 1. check nine search points � 2. Step size is reduced by half � after each step. 1 1 1 3. At the end of the search the � step size is one pel. Repeat algorithm 3 times � Examines 25 points � 1 1 1 Number of search points: 1 + � 8 log w 2 2 2 Advantage: simple and regular � 3 3 3 structure, good for HW 3 2 3 1 2 1 1 implementation 3 3 3 Disadvantage: a uniformly � allocated checking point that 2 2 2 makes it inefficient for small motion. 52

  26. Diamond Search (DS) Diamond Search (DS) � Experimental results show that: 53% to 98% of the motion � vectors are enclosed in a circular area with a radium of 2 pels and centered on the position of zero motion. � The block displacement of real-world video sequences is mainly in horizontal and vertical directions. � the search points incurred within the circle with a radium of 2 pels. � outperforms the TSS algorithm 53

  27. DS Algorithm DS Algorithm � 1. 9 checking points of LDSP are tested. If the minimum point is located at the center position, go to Step 2; otherwise recursively repeat this step for the best point. � 2. Switch the search pattern from LDSP to SDSP. The minimum point found in the best point. LDSP SDSP 54

  28. DS Algorithm DS Algorithm (b) LDSP->LDSP when minimum is at one of the corner points � (c) LDSP->LDSP when minimum is along the edge of the � diamond (d) LDSP->SDSP when minimum is at the center of the search � pattern. 55

  29. H.264 ME Algorithm (UMHexagonS) H.264 ME Algorithm (UMHexagonS) � 1) Initial search point prediction � 2) Unsymmetrical-cross search � 3) Uneven multi-hexagonal-grid search � 4) Extended hexagonal based search Note that the ME is not a mandatory part, Here just the implemented ME in reference software is described. 56

  30. Initial Search Point Prediction Initial Search Point Prediction � A median predictor is used for defining the initial search point � That is the median value of the motion vectors of three spatially adjacent blocks- left, top and top-right (top-left) of the current block. D B C A E ( ) = mv _ pred ( pred _ x , pred _ y ) median mv _ A , mv _ B , mv _ C 57

  31. Unsymmetrical- -Cross Search Cross Search Unsymmetrical the movement in the horizontal direction is much heavier than � that in the vertical direction- Based on experimental results The distance between search points is chosen to be 2 � The minimum cost MV will be chosen as search center of next � search step 58

  32. Uneven Multi- -Hexagonal Hexagonal- -Grid Search Grid Search Uneven Multi 59

  33. Extended Hexagonal- -Based Search Based Search Extended Hexagonal When previous optimum � MV locates in the outer concentric area, the search result has relatively low accuracy motion vector � refinement by extended hexagonal based search method. 60

  34. Motion Estimation in H.264 Motion Estimation in H.264 � On of the main H. 264 enhancement feature is its motion estimation algorithm What is new? � � Variable Block size Motion Estimation, Can yield 15% bit rate savings � � Multiple reference frame Motion Estimation, � 5-20% bit rate savings � Sub Pixel Motion Estimation, � 20% bit rate savings over integer ME 61

  35. Variable Block Size ME Variable Block Size ME A 16x16 macro block may contain more than one object � In other words: size of moving/stationary objects is variable � The objects may move in different directions, � one motion vector is not enough to describe all objects movement � By defining one MV some part of the object will describe well and the other part � will give a big error. The solution is defining variable block size � The macro block with more details will be coded using a smaller block size � block size partitioning 7 various block size in H.264 62

  36. Variable Block Size ME (Cont’ ’d) d) Variable Block Size ME (Cont Coder Input Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Transform Macroblocks 16x16 pixels Entropy Coding De-blocking 16x16 8x8 16x8 8x16 Filter Intra-frame MB 0 0 1 Prediction 0 0 1 Types 2 3 Output 1 Motion- Video 8x8 8x4 4x8 Compensation Signal 4x4 Intra/Inter 0 1 0 8x8 0 0 1 Motion Types 2 3 1 Data Motion 63 Estimation

  37. Partitions of MB Partitions of MB 64

  38. Variable Block Size ME (Cont’ ’d) d) Variable Block Size ME (Cont � Inter MB can be partitioned into smaller regions for ME: � Up to 16 MVs � MVs are differentially encoded. � Need lots of optimization efforts to decide the best mode: SAD + λ (Q) R � Mode decision: � R-D optimization with Lagrangian method � Also an active research area. 65

  39. Variable Block Size ME - - Example Example Variable Block Size ME T=1 T=2 66

  40. Variable Block Size ME - - Example Example Variable Block Size ME T=1 T=2 67

  41. Variable Block Size ME - - Example Example Variable Block Size ME T=1 T=2 68

  42. Multiple Reference Frames ME Multiple Reference Frames ME In previous standards up to 2 reference frames used for ME � Here, up to five different reference frames can be selected � resulting better subjective video quality and more efficient � coding of the video sequence. might help making the H.264 bit stream error resilient. � 69

  43. Multiple Reference Frames ME Multiple Reference Frames ME � In H.263, the reference frame for prediction is always the previous frame � In MPEG and H.26L, some frames are predicted from both the previous and the next frames (bi-prediction) � In H.264, up to 16 frames may be used as reference: � Encoder and decoder maintain synchronized buffers of available frames (previously decoded) � resulting better subjective video quality and more efficient coding of the video sequence � might help making the H.264 bit stream error resilient 70

  44. Multiple Reference Frames ME Multiple Reference Frames ME Coder Input Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Transform Macroblocks 16x16 pixels Entropy Coding De-blocking Filter Intra-frame Prediction Output Motion- Video Compensation Signal Intra/Inter Multiple Reference Frames for Motion Data Motion Compensation Motion 71 Estimation

  45. Subpixel Motion Estimation Motion Estimation Subpixel When an object has a sub-pixel movement the integer pixel ME can’t � describe it, so sub pixel ME is defined H.263 uses only half pixel and MPEG-4 uses quarter pixel accuracy � A gain of 1.5-2dB across the board over ½-pixel � H.264 uses higher precision of spatial accuracy for ME up to eighth � pixel accuracy 72

  46. Example � b = round [(E – 5F + 20G + 20H – 5I + J)/32] 73

  47. Example (cont’d) � a = round [(G + b)/2] 74

  48. Chroma Motion Vector 75

  49. H. 264 Cost Function H. 264 Cost Function The best match is found by minimizing the cost function: � λ = + λ − J ( m , ) SAD ( s , c ( m )) . R ( m p ) motion motion m=(m x ,m y ) T is the motion vector � p=(p x ,p y ) T is the predicted motion vector � λ motion is the Lagrange multiplier � R(m-p) represents the bits used to encode the motion information � The SAD (sum Absolute Difference) is computed as: � B , B ∑ = − − − SAD ( s , c ( m )) s [ x , y ] c [ x m , y m ] x y = = x 1 , y 1 Where B = 16, 8 or 4 and s being the original video signal and c being the coded video � signal 76

  50. MB Modes MB Modes � A MB can select one of these modes: � Intra_16x16 � Intra_8x8 (not allowed in Baseline) � Intra_4x4 � I_PCM: � enables an encoder to transmit the values of the image samples directly (without prediction or transformation). � Inter_16x16 � Inter_16x8 � Inter_8x16 � Inter_8x8 � SKIP 77

  51. P_SKIP Type P_SKIP Type � For this type, neither a quantized prediction error signal, nor a motion or reference index parameter is transmitted � The reference picture is located at index 0 in the multi-picture buffer � The motion vector is predicted from motion vector predictor It’s used for large are with no change or � constant motion. � Its size is 16x16 78

  52. Mode Decision Method in H.264/AVC Mode Decision Method in H.264/AVC Calculate the RDCost for each Intra mode � Calculate the RDCost for SKIP mode � For each inter mode (16x16, 16x8, 8x16 and 8x8), � For each block in the current mode � Do ME in a search area, select the point that minimizes below equation: � λ = + λ − J ( m , ) SAD ( s , c ( m )) . R ( m p ) � motion motion End � Calculate the RDCost using: � RDCost = Distortion + λ × Rate Note that : � Rate needs doing: Transform, Quantization and entropy coding � Distortion needs doing: Transform, Quantization Transform -1 and Quantization -1 � End � From the calculated RDCosts: � (RDCost_Intra_16x16, RDCost_Intra_4x4, RDCost_I_PCM, RDCost_SKIP, RDCost_Inter_16x16, RDCost_Inter_16x8, RDCost_Inter_8x16 and RDCost_Inter_8x8) select the least one as the best mode. 79

  53. Slice � Each frame can be coded in one or more slices, each containing one (16 x x 16) or all the macroblocks in the frame (1 slice per picture) � The number of macroblocks per slice need not be constant within a picture � Because of minimal inter-dependency between coded slices propagation of error can be limited 80

  54. Slice Coding Slice Coding Slices can have different shapes and sizes � Slices do not have to be consecutive in the raster scan � Each slice is self-contained � Can be decoded without knowing data other slices � Useful for: � Error resilience and concealment � Parallel processing � 81

  55. Slice Type Slice Type � Each slice can be coded as one of 5 types: � I slice: � All MBs are coded using intra mode. � P slice: � A MB can be coded in intra mode or inter mode with at most one prediction signal per block. � B slices: � In addition to modes in P slice, some MBs can also be predicted using two prediction signal per block. � SP slice: Switching-P slice � To facilitate switching between different video streams � SI slice: Switching-I slice Using only Intra prediction � 82

  56. Slice Modes in H.264 Slice Modes in H.264 83

  57. Slice Syntax Slice Syntax 84

  58. Slice Syntax � A macroblock contains coded data corresponding to a 16 x x 16 sample region of a video frame 16 x x 16 for luma and 8 x x 8 for cr, cb 85

  59. Slices � The H.264 encoder intelligently groups MBs into a slice whose size is less than (or equal to) the size of the maximum transportation unit (MTU). � Slices are decoded independently � Prediction beyond the slice boundaries is forbidden to prevent error propagation from intra-frame predictions 86

  60. Arbitrary Slice Order (ASO) The Baseline Profile supports the decoding order of � the slices to be arbitrary. � permits, for example, to reduce decoding delay in case of out-of-order delivery of NAL units. � Application example � reduce end-end transmission delay in RT app 87

  61. Flexible Macroblock Ordering (FMO) � Using FMO, it is no longer required that slices consist of neighboring macroblock. � provide efficient methods for error concealment in error-prone channels � The objective behind the flexible macroblock ordering (FMO) is to scatter possible errors to the whole frame as equally as possible to avoid error accumulation in a limited region . 88

  62. Slice Group � Slice Group : a subset of the macroblocks and may contain one or more slices � In FMO frame is divided to some slice groups. � Each macroblock could be assigned freely to a certain slice group using a MAP function. 89

  63. MAP Function 90

  64. Redundant Coded Picture � Send the duplicated part or all of a coded picture � In normal operation, the decoder reconstructs the frame from ‘primary’ (nonredundant)’ pictures and discards any redundant pictures. � However, if a primary coded picture is damaged (e.g. due to a transmission error), the decoder may replace the damaged area with decoded data from a redundant picture if available. 91

  65. MB Prediction Types MB Prediction Types � Intra: � MB is predicted from the neighboring blocks of the same frame. Intra prediction is performed on 16x16, 4x4 and 8x8 (in FRExt profile) blocks. � Inter: � MB is predicted form the regions in previous (next) frames, using motion estimation. 92

  66. MB Syntax Element MB Syntax Element 93

  67. Transformation and Quantization in H.264 94

  68. Transformation � H.264 uses three transforms � Hadamard transform for the 4 x x 4 array of luma DC coefficients � Hadamard transform for the 2 x x 2 array of chroma DC coefficients � DCT-based transform for all other 4 x x 4 blocks in the residual data 95

  69. Transformation [1] 96

  70. Transformation � Fundamental differences between H.264 transform and DCT � It is an integer transform � It is possible to ensure zero mismatch between encoder and decoder � Can be implemented using only additions and shifts � A scaling multiplication is integrated into the quantizer � Can be carried out using 16-bit integer arithmetic 97

  71. Transformation � DCT [1] 98

  72. Transformation � DCT Approximation [1] 99

  73. Transformation � 4 x x 4 Hadamard Transform � 2 x x 2 Hadamard Transform [1] 100

Recommend


More recommend