Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter 9 Prof. Dr.- -Ing. Jürgen Teich Ing. Jürgen Teich Prof. Dr. Lehrstuhl für Hardware- -Software Software- -Co Co- -Design Design Lehrstuhl für Hardware Reconfigurable Computing
Overview Overview � FPGAs have been used in the past mostly in � Rapid prototyping � Non-frequent reconfigurable systems � Hardware implementation, sometimes specific for the FPGA architecture The most important application areas are: � Searching (text, genetic database, etc.) � Image processing � Mechanical control � Etc. Reconfigurable Computing 2
Searching – – pattern matching pattern matching Searching � Pattern matching is the basis of search engines � The purpose is to find and (count) the occurrence of a given pattern in a given text � Useful in: � Dictionaries � Document collection indexing � Document filtering and classification � Spam avoidance � Content surveillance Reconfigurable Computing 3
Searching – – pattern matching pattern matching – – sliding windows sliding windows Searching � Sliding windows (Cockscot & Foulk ) � Keywords are kept in register. One character / Byte � A set of comparators are used. One comparator / Byte � Hit signal is set whenever the text- segment matches the corresponding word � Advantage: � Easy to replace old patterns � Drawbacks: � Not flexible: Fixed length of registers � Redundancy: more comparators than necessary for word with same prefix Reconfigurable Computing 4
Searching – – pattern matching pattern matching - - sliding windows sliding windows Searching � Avoid redundancy Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit � Use only one comparator for common characters in different words � Data folding (Foulk) 8-bit comparator � Fold the data in the circuit � Consider the bit-representation of Bit Bit Bit Bit Bit Bit Bit Bit each character � Generate a comparator circuit for each character in the words to be searched for 01001110-Comparator Reconfigurable Computing 5
Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching � FSM-Based pattern matcher � Each regular grammar can be recognized by an FSM � In pattern matching, the target words define the regular grammar � The target words are compiled in the automaton � Each word defines a unique path from the start state to an end state � When scanning a text, the automaton changes its state with FSM-Recognizer and corresponding the appearance of characters state transition table for the word conte � Reaching a final state corresponds to the appearance of a word � Redundancy is avoided by implementing common prefix Reconfigurable Computing 6
Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching � FSM-Based pattern matcher � RAM-implementation Char reg � One RAM or ROM for storing the Hit detect state transition table RAM ROM Character � One state register stream � One character register Next State Reg � A hit detector state � The Input character and the state RAM/ROM implementation register are used to determine the next of the word recognizer state � The hit detector checks if the current state is equal to a hit state and sets a hit for the corresponding word � Advantage: � Simple to implement � Drawback: Expensive in terms of flip flops � Reconfigurable Computing 7
Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching � FSM-Based pattern matcher � One-hot implementation � Each state is coded in one flip flop � The D-input of the flip flop is obtained by an AND of the output of the previous flip flop with the result of the comparator � The comparator is character- c specific � Only n FF are used to implement a e t o n word of length n � Advantage: � Low cost � Reflects the structure of the grammar � Drawback: Character-specific comparators � Not easy to build � Redundancy in the comparators Reconfigurable Computing 8
Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching FSM-Based pattern matcher � � Exploiting common prefix � For words with common prefix, only one common starting path corresponding to the length of the common prefix is used. � Redundancy of comparators can be avoided by implementing only one comparator for each character. The result of the comparison will then be provided to all gates using them Words with common prefix and the corresponding FSM Reconfigurable Computing 9
Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching FSM-Based pattern matcher � � Optimized architecture � Implement the common prefix � Redundancy of comparators is removed: Each character in the set is implemented in a position vector: pos(i) = 1 iff Block diagram of the optimal character i is detected pattern matcher Detailed structure of the optimal pattern matcher Reconfigurable Computing 10
Searching – – pattern matching pattern matching – – use of use of Searching reconfiguration reconfiguration Bit Bit Bit Bit Bit Bit Bit Bit � FSM-Based pattern matcher � Use of reconfiguration � Replace the character comparators � Replace the FSM for a set of Reconfiguration words New character comparator New set of words R e c o n fig u r a tio n Reconfigurable Computing 11
Signal processing – – distributed arithmetic distributed arithmetic - - Signal processing Motivation Motivation � Signal processing applications (FFT, Convolution, Filter algorithms) are characterized by MAC-intensive computations Signal processing functions are usually implemented on � special processors � DSPs � ASICs FPGAs provide the advantage of reconfigurability, but � MAC-intensive applications are expensive � However, for MAC computations involving one constant vector, FPGAs present one of the best alternatives to DSPs Reconfigurable Computing 12
Signal processing- - distributed arithmetic distributed arithmetic - - Basics Basics Signal processing ( ) ∑ ∗ ∗ Solution of the following equation: Z = A X = A i X i A constant row vector, X column vector X ∑ j With the binary representation for X i : = X 2 i ij ( ) ( ) ∑ ∑ ∑ ∑ j j ∗ ∗ ∗ ∗ ∗ Z = A X = A X 2 = 2 A X i ij i ij ( ) ∗ ∑ ∑ j ∗ is the classical form of distributed arithmetic Z = 2 A X i ij Because the A i are constant, there exist 2 n possible values ∑ A ∗ i X for ij We can pre-compute the possible values and store them in a LUT (DALUT) and retrieve them on demand at run-time FPGA Advantage: Computation is memory-based (use of LUTs) Reconfigurable Computing 13
Signal processing- - distributed arithmetic distributed arithmetic - - Basics Basics Signal processing To better understand, we spread the DA equation Z=[ ] 0 ∗ ∗ ∗ ∗ X ( ) A ( ) + X A X A + X A 2 ......................... − − n 1 0 n 1 n0 n 10 1 20 2 ] 1 ∗ ∗ ∗ ∗ X ( ) A ( ) + X A + [ X A + X A 2 ......................... − − n 1 1 n 1 n1 n 11 1 21 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . ] W ∗ ∗ ∗ ∗ X ( ) A ( ) + X A + [ X A + X A 2 ......................... − − n 1 W n 1 nW n 1 W 1 2 W 2 The bits of the variables will be used to address the memory and retrieve the required values in a bit-serial way. The DA-datapath implementation is straightforward Reconfigurable Computing 14
Signal processing- - distributed arithmetic distributed arithmetic - - Datapath Datapath Signal processing DA-LUT Address DA-LUT 0 X X X 1 X ( ) ................ − 11 10 W 1 W 1 A 1 A 2 X 2 X ( ) X X − ................ A 1 A + W 2 W 1 21 20 2 A 3 . A + A . 3 1 . A + A . 3 2 . . A + A + A 3 2 1 A 4 X nW X ( ) X X − ................ n W 1 n1 n0 . . . Parallel bit-serial j-shift input Z +/- Reconfigurable Computing 15
Signal processing- - distributed arithmetic distributed arithmetic - - Datapath Datapath Signal processing k-parallel X X ( ) X 11 X − ................ 1 W 1 W 1 10 X X ( ) X 21 X − ................ 2 W 2 W 1 20 . . . . . . X nW X ( ) X n1 X − ................ n W 1 n0 DA-LUT 1 DA-LUT 2 DA-LUT k ACC 2 ACCk ACC 1 Adder tree Z Reconfigurable Computing 16
Signal processing- - distributed arithmetic distributed arithmetic - - Example Example Signal processing Recursive convolution of time domain simulation of optical multimode intra/system interconnects Recursive formula to be implemented on 3 intervals ( ) ( ) ∗ ∗ − ∗ ∗ ∗ y t = f y t + f x f x + f x + f x − n 0 n 1 4 0 5 1 24 2 53 3 Comparison of different Virtex 2000E implementation on implementations the Celoxica RC1000-PP board Reconfigurable Computing 17
Recommend
More recommend