towards low partition overhead in image decomposition
play

Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju - PowerPoint PPT Presentation

Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju Wang, University of Florida Motivation and Goals Image decomposition is required in parallelizing many image processing algorithm Different decomposition solutions can


  1. Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju Wang, University of Florida

  2. Motivation and Goals • Image decomposition is required in parallelizing many image processing algorithm • Different decomposition solutions can affect the performance gain of the parallelism • Decomposition scheme determines the additional overhead associated with inter-processor communication, due to – Local dependency of image pixels – Communication delay in cluster and distributed memory systems • This overhead should be minimized, especially for real-time image (video) processing, such as parallelized video encoding. • Goal: develop image decomposition algorithms that result in low partition overhead, and demonstrate one particular application in MPEG-2 video decoding.

  3. Model of the Parallelized Image Algorithm • The image processing algorithms considered here are decomposable • Parallel processing is achieved by executing multiple identical threads • Each thread works on an assigned area of the original image • There is no dependancy among parallel threads • The results of individual threads are merged in a master node

  4. Image Decomposition Problem Description Notations: • pixel set I : the set of all pixels for a rectangle-shape image with width w and height h → 2 I , which maps a pixel in I to a subset of I . • local dependency f : I − f is determined by a specific image processing algorithm • { P 1 , P 2 , ..., P N } be a N-way disjoint partition for image I where P k ⊂ I is the k th part Definition of Decomposition Overhead � � � � � � � � PO ( P, N, f ) = { j ∈ f ( i ) and not j ∈ P k } � � � � i ∈ P k k =1 ...N � �

  5. Determining Communication Overhead The amount of additional image data need to be accessed by other processing threads is determined by • the pattern of local dependence determined by the image processing algorithm. – In image smoothing filter, the size of the kernel mask determines the neighborhood surrounding the target pixel. A good partition can limit the partition • the partition method used. overhead within a reasonable range. While a ill-partitioned image might require the access of the whole image in each threads. Such a bad partition can be constructed by zig-zag scanning the image, and assigning pixels to different threads in round-robin.

  6. Decomposition Overhead Example D C A B • Total length of internal partition edge • Diameter of neighboring area • Problem reduced to the search of partition with the shortest partition circumference

  7. More Partition Examples D A D A B C C B (a) (b) A D D A C B B C (c) (d)

  8. More Partition Examples A A D D B B C C (a) another 4−piece−partition (b)the reference area of piece A

  9. Low Boundary Analysis Fact: for a given area, the shape with the shortest circumference is circle. Assume that the image is a perfect square with edge length w , • the image is to be divided into N disjoint piece with equal area, the area of each part must be w 2 /N . � • the radius of the circles in the ideal partition is r = w 2 /N ∗ π . • partition must be the optimal one with shortest overall circumference. � • the circumference for each circle: c = 2 ∗ π ∗ r = 2 ∗ π ∗ w 2 /N . • The total circumference of the N circles become T c = N ∗ c = 2 ∗ √ √ π ∗ N ∗ w 2 = 2 ∗ w ∗ π ∗ N . • subtract the external circumference of the original image, which is 4 ∗ w √ • c i ( w, N ) = T c − c e = w ∗ π ∗ N − 2 ∗ w 2

  10. Horizontal-Vertical Partition and Low Boundary Results 5000 4500 4000 3500 3000 2500 2000 1500 −*−pseudo ideal partition −o−HV partition 1000 500 0 0 10 20 30 40 50 60 70 • Partition overhead grows as the number of partition increases. • H-V partition is comparable to ideal partition at small scale parallel setting.

  11. Problem Formulation • quadric assignment problem : � � min ( X i,k .X j,h .d k,h .r i,j ) i,j ∈{ 1 ··· N } k,h ∈{ 1 ··· K } • X i,j = 1 when macroblock i is assigned to j th processor. • X i,j ∈ { 0 , 1 } , i X i,j = H · V • � K • � j X i,j = 1

  12. Heuristic Partition Algorithm • divide and conquer • image area is divided into two parts each time • always try the shortest dimension when dividing a image part • the division is balanced in determine the area of sub image area

  13. Partition Example with Proposed Heuristic Algorithm partition(r0,7) partition(r1,4) partition(r2,3) partition(r3,2) partition(r4,2) partition(r5,2)

  14. Performance Results Reference Data In Data Partition Algorithms 35 Picture size: 720*480 Amount of Reference Data per Picture (Mbits) Search Window Size=32 pixels 30 25 20 −.− Horizontal partition −*−vertical partition 15 −+−HV partition −d−Quick partition 10 5 0 0 5 10 15 20 25 30 35 40 Number of Partition

  15. Conclusion • Discussed the challenge of image decomposition in parallel image processing • provided an analysis for the low boundary of partition overhead • Proposed an heuristic algorithm which can produce good partitions with low decomposition overhead • Our experiments of this algorithm in a parallel MPEG-2 video decoder shows positive preliminary results. Thank You!

Recommend


More recommend