tkt tkt 24 2431 31 so soc c de design sign
play

TKT TKT-24 2431 31 So SoC C de design sign Introduction to - PowerPoint PPT Presentation

TKT TKT-24 2431 31 So SoC C de design sign Introduction to exercises SoC design / Fall 2011 Exercises Assistants: Antti Alhonen antti.alhonen@tut.fi Jussi Raasakka jussi.raasakka@tut.fi (Otto Esko otto.esko@tut.fi) In


  1. TKT TKT-24 2431 31 So SoC C de design sign Introduction to exercises SoC design / Fall 2011

  2. Exercises  Assistants:  Antti Alhonen antti.alhonen@tut.fi  Jussi Raasakka jussi.raasakka@tut.fi  (Otto Esko otto.esko@tut.fi)  In the project work, a simplified H.263 video encoder is implemented on Altera DE2 FPGA Development and Education board  The projects work consists of a set of exercises  After successfully finishing each exercise, one should have a working H.263 video encoder  Exercises: Mon 14-16 , Tue 14-16 , Wed 16-18 (TC417)  Assistance not available in any other time  All needed software is installed on the PCs of the class and can be used whenever the class is not reserved for other courses SoC design / Fall 2011

  3. Exercises cont.  Attending the exercise hours is voluntary  The following assignment is introduced  Tools and algorithms are introduced  Hints are given  Questions are answered  Completing each of the exercises is mandatory  The returns have to be in time  The returns have to be accepted  Exercise work is carried out in groups of 1-2 students  Groups of 2 persons are preferred SoC design / Fall 2011

  4. Exercises cont.  The exercise work consists of several phases and sub-tasks  Receiving and understanding the system requirements  Writing a system specification  Software implementation of the encoder  Functional verification on PC workstation  Migrating the SW implementation onto FPGA  Verification and performance profiling for pure SW implementation  HW/SW partitioning and hardware acceleration  Verification and performance profiling for accelerated implementation  Documentation SoC design / Fall 2011

  5. Exercises cont.  Completed exercise work is valid for three successive exams  Points from the exercise work  You can gain points from some of the exercises  See exercise pages for more detail  Bonus point criteria will be explained during the first exercises  http://www.tkt.cs.tut.fi/kurssit/2431 SoC design / Fall 2011

  6. Ex Exer ercise cise 1 / 1 / P Par art 1 t 1 Introduction to topic SoC design / Fall 2011

  7. Top opic ic of of th the w e wor ork  A simplified H.263 video encoder on DE2 FPGA Education and Development board  The system design flow  Introducing the requirements for video encoder  Functional specification is written  Software implementation written in ANSI C language of the video encoder algorithm is made and verified on PC workstation  Initial hardware architecture containing a single Nios II softcore CPU and necessary peripherals is synthesized for FPGA  Software version is migrated to Nios II processor on FPGA  Design is partitioned into software and hardware according to the profiling result of software implementation  DCT algorithm is accelerated with dedicated logic  Accelerated system is implemented and verified on FPGA  Performance analysis is carried out for the accelerated system as well and compared with the pure software implementation SoC design / Fall 2011

  8. H.263  The basics of H.263 video encoding are explained during following exercises  Students are encouraged to get familiar with video encoding algorithms in general before they start the project  H.263 has a lot in common with algorithms like JPEG and MPEG-2  A very simplified version of H.263 video encoder (resembling motion JPEG) is used.  Only INTRA coding (i.e. prediction of subsequent frames is not applied)  Algorithms used are DCT (Discrete Cosine Transform), Quantization, RLE (Run-Length Encoding), and VLC coding SoC design / Fall 2011

  9. Software  Altera Quartus II v7.2  System development front-end  Schematic editing  FPGA synthesis  SOPC builder for building Avalon/Nios based systems  Integrated Iogic analyzer  Nios II IDE  Software development environment for Nios II processor  Part of Nios II development kit  Mentor Graphics ModelSim  Simulating own VHDL blocks/designs  ffplay  video player  tmndec  H.263 decoder  nios2-terminal  Terminal software for reading from jtag uart SoC design / Fall 2011

  10. Hardware  Altera DE2 Development and Education Board  Cyclone II 2C35 FPGA  33,216 logic elements  483,840 bits of embedded RAM  35 Embedded multipliers  4 PLLs  475 User I/O pins (at maximum)  External memory devices  4 MB Flash  512 KB SRAM  8 MB SDRAM  RS-232 serial port  Used for communication between PC and Nios II processor  USB blaster port  Used for programming the FPGA (memory contents and HW configuration)  In addition, the board contains following peripherals (not so relevant for the project)  Ethernet MAC/PHY device  4x user push-buttons, 18x toggle switches  18x red user leds, 9x green user leds  8x dual 7-segment display  2x expansion headers (40 user I/O pins / header)  SD flash connector header  50 MHz and 27 MHz Oscillators SoC design / Fall 2011

  11. Exercise returns  Exercises are returned as follows:  Return for an exercise has to be made before the next week’s sunday at 23:59 by E-mail  Return your exercises to tkt2431@cs.tut.fi  All the required documents have to be in either pdf or pure text-file format  The subject for the email has the following form: SOCD_Ex<exercise_number>_G<group_number> where  <exercise_number> is the number of the exercise in question and  <group_number> is the number of your group. SoC design / Fall 2011

  12. Bonus points  Three main exercise returns are rated  Excellent: 1 bonus point for the exam  The returned document is very good and/or the returned source codes work correctly and are well done  Accepted: no bonus  The returned document or code is acceptable  Rejected: no bonus, the return has to be corrected  Use common sense: Do not return rubbish!  All the exercises have to be accepted  Exercise points for the exam can be obtained:  1 point can be obtained from each of the exercises 2, 5, 12  Encoder achieves the given frame rate criteria (2p if > 75fps, 1p if > 50 fps)  2 most optimized encoder are awarded extra points  Bonus exercise: Dual Nios II encoder implementation  Up to 3 bonus points can be achieved SoC design / Fall 2011

  13. Ex Exer ercise cise 1, 1, Par art 2 t 2 Introduction to algorithms SoC design / Fall 2011

  14. Requirements for Video Transmission  Communication delay  More important in video conferencing applications than in file-based streaming applications  Should be as low as possible (< 250 ms, even 150 ms)  Should be kept as constant as possible  Avoiding burst of frames followed by a still image  Buffering  Frame rate  Affects to perceived smoothness of motion  Under 10 fps video stream is perceived as “fast slide show”  Image resolution  Directly proportional to data size of a raw image  Depends on the application SoC design / Fall 2011

  15. Introduction to H.263 Standard  May 1996, ITU-T recommendation v1  Block-based ( Macroblock size is 16 pixels by 16 lines )  Motion estimation for temporal redundancy reduction  Same objects are likely to be present in adjacent frames  Half pixel accurate motion vectors  DCT for spatial redundancy reduction  8 x 8 blocks  Adjacent pixel values have only a little difference  Quantization (lossy)  Control of compression ratio  RLE and Huffman as entropy coding algorithms SoC design / Fall 2011

  16. Block Diagram of H.263 Encoder Prediction error computation In Intra mode, MBs are coded directly + pre-processing + DCT Q Entropy coding - bits out (Huffman, VLC) 7 0 4 0 0 0 0 1 1 9 3 0 0 0 0 0 2 0 0 0 0 0 0 0 Q -1 Mot. Comp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 v(u,v) 0 0 0 0 0 0 0 0 0 IDCT 1 1 0 Mot. Est. 0 1 0 motion vector v(u,v) 1/2 pixel accurate (interpolation) Previous reconstructed pictures No need to send (same image as the decoder zeros in 8x8 block to observes) the decoder SoC design / Fall 2011

  17. Discrete Cosine Transform (DCT)  Assumption: Adjacent pixels differ only a little from each other  Thus, data in the frequency domain is easier to compress  Spatial domain compression  Pixels are grouped into blocks and the blocks are then transformed into frequency domain  Essential information is then in more compact form  Important DCT-coefficients in upper-left corner, that is, in low frequencies  Compression is achieved by discarding the less important information of the transformed block  Quantization of coefficients  DCT itself is a lossless transform  Limited accuracy with coefficients, however, leads to some loss of information SoC design / Fall 2011

  18. Entropy Encoding  After quantization, the quantized coefficients are compressed in a lossless manner using entropy encoding  Run-length coding o Lower amplitude coefficient likely to be zero o Arrange successive quantized non-zero coefficients into combinations of (LAST, RUN, LEVEL) • Last = Whether this is the final non-zero coefficient in the block • RUN = Number of preceding zeros • LEVEL = sign and magnitude of the non-zero coefficient o Coefficients are processed in zig-zag order • Due to the fact that running zeros are most likely located at higher frequencies  Huffman coding (variable length coding) o After RLE coefficients are encoded based on the statistical characteristics • Shorter codewords for symbols which occur with high probability SoC design / Fall 2011

Recommend


More recommend