Table of Contents � Chapter 1 – Introduction � Chapter 2 – First Prototypes of an Associative Computing (ASC) Processor Design and Implementation of an FPGA-Based � Chapter 3 – A Scalable Pipelined ASC Processor With Scalable Pipelined Associative SIMD Processor Reconfigurable PE Interconnection Network Array with Specialized Variations for Sequence � Chapter 4 – A Specialized ASC Processor with Reconfigurable Comparison and MSIMD Operation 2D Mesh for Solving the Longest Common Subsequence (LCS) Problem � Chapter 5 – An ASC Processor to Support Multiple Instruction Stream Associative Computing (MASC) Hong Wang � Chapter 6 – Conclusions and Future Work Department of Computer Science Kent State University Nov 3 rd , 2006 Dissertation Defense Nov 3 rd , 2006 Dissertation Defense 2 Implementing Associative Computing in the ASC Associative Computing Processor � Associative computing is particularly well suited to processing � Associative search : the Control Unit broadcasts the search key records of data in a tabular format to all PEs to compare with local memory. If search is successful, � As illustrated, each Processing Element (PE) of the SIMD those PEs are designated responders , and they set their associative computing array can store a record of this tabular Responder bit and the top of their Mask Stack to ‘1’ data in its memory � Process the responders sequentially : STEP instruction uses Responder Resolution Unit and Mask Stack to process responding PEs one by one. Search STEP1 STEP2 Student Name ID Grade Mask RSPD Mask RSPD Mask RSPD � Searching for maximum/minimum value in a field uses Falkoff PE0 John Smith 07 66 0 0 0 0 0 0 Algorithm, process bit slices from left to right . PE1 Gary Heath 05 95 1 1 1 0 0 0 PE2 Peter Smith 11 87 0 0 0 0 0 0 John Smith 04 78 0 0 0 0 0 0 PE3 Search STEP1 STEP2 Tarry Stanley 02 100 1 1 0 1 1 0 PE4 Student Name ID Grade Mask RSPD Mask RSPD Mask RSPD PE5 Will Hanson 01 84 0 0 0 0 0 0 PE6 Jane Antony 06 64 0 0 0 0 0 0 John Smith 07 66 0 0 0 0 0 0 PE7 Mark Bloggs 13 88 0 0 0 0 0 0 Gary Heath 05 95 1 1 1 0 0 0 PE8 Gill Pister 09 75 0 0 0 0 0 0 Peter Smith 11 87 0 0 0 0 0 0 PE9 Min Lee 10 83 0 0 0 0 0 0 PE10 Goby Carmen 03 83 0 0 0 0 0 0 John Smith 04 78 0 0 0 0 0 0 PE11 Gillian Roger 08 26 0 0 0 0 0 0 Tarry Stanley 02 100 1 1 0 1 1 0 Nov 3 rd , 2006 Dissertation Defense 3 Nov 3 rd , 2006 Dissertation Defense 4
Image Processing (Edge Detection Using Database Processing Convolution) � In the following slides I present some applications of our Output Image processor 0 0 0 0 0 0 � Relational Database Processing: O(|B|) 0 1 0 0 1 0 � Intersection, Union, Cartesian Product and Join are basic 0 1 0 0 1 0 operations in Database processing. Using associative Search Input Image 0 1 0 0 1 0 and STEP operations, we can achieve much faster processing 0 0 1 0 0 0 0 0 0 0 time 0 0 0 0 0 1 1 1 1 0 Intersection Union 0 1 1 1 1 0 Relation A Relation A 0 1 1 1 1 0 Student ID Class Student ID Class PE7 04 239 04 239 0 1 1 1 1 0 PE8 11 111 11 111 PE9 07 239 CR 07 239 CR 0 0 0 0 0 0 PE10 07 124 07 124 PE11 05 124 05 124 PE12 04 111 04 111 -1 0 1 Relation B Relation B Step 1 Step 1 Weight -1 0 1 PE13 05 111 05 111 2 2 PE14 04 111 04 111 -1 0 1 3 3 PE15 07 124 07 124 4 4 PE16 11 124 11 124 Nov 3 rd , 2006 Dissertation Defense 5 Nov 3 rd , 2006 Dissertation Defense 6 String Matching Table of Contents text$ counter$ match$ patt_counter text$ counter$ match$ patt_counter 1 @ 0 0 R APE 0 1 @ 0 0 R APE 0 Assoc. Assoc. Control Control patt_length patt_length 2 A 0 0 R APE Unit 2 A 0 0 R APE Unit (CU) 2 (CU) 2 3 B 0 0 R APE 3 B 0 0 R APE � Chapter 1 – Introduction patt_string patt_string 4 A 0 0 R APE AB 4 A 0 0 R APE AB � Chapter 2 – First Prototypes of an Associative Computing 5 A 0 0 R APE 5 A 0 0 R APE j (ASC) Processor text$ counter$ match$ patt_counter text$ counter$ match$ patt_counter 1 @ 0 0 APE 0 Assoc. 1 @ 0 0 APE 0 � 1 Assoc. � Chapter 3 – A Scalable Pipelined ASC Processor With Control Control patt_length patt_length 2 A 0 0 APE Unit 2 A 0 � 1 0 APE Unit (CU) 2 Reconfigurable PE Interconnection Network (CU) 2 3 B 0 0 R APE 3 B 0 0 R APE patt_string patt_string 4 A 0 0 APE AB � Chapter 4 – A Specialized ASC Processor with Reconfigurable AB 4 A 0 0 APE 5 A 0 0 APE j 5 A 0 0 APE j 2D Mesh for Solving the Longest Common Subsequence (LCS) text$ counter$ match$ patt_counter Problem text$ counter$ match$ patt_counter 1 @ 0 0 R APE 1 Assoc. 1 @ 0 0 APE 1 Assoc. Control � Chapter 5 – An ASC Processor to Support Multiple Instruction patt_length Control 2 A 1 0 R APE patt_length Unit 2 A 1 0 R APE Unit (CU) 2 (CU) 2 Stream Associative Computing (MASC) 3 B 0 0 R APE 3 B 0 0 APE patt_string patt_string 4 A 0 0 R APE AB AB 4 A 0 0 APE � Chapter 6 – Conclusions and Future Work 5 A 0 0 R APE j 5 A 0 0 APE j text$ counter$ match$ patt_counter text$ counter$ match$ patt_counter 1 @ 0 � 2 0 APE 1 � 2 Assoc. 1 @ 2 0 R APE 2 Assoc. Control patt_length Control 2 A 1 0 R APE Unit patt_length 2 A 1 0 R APE Unit (CU) 2 (CU) 2 3 B 0 0 APE 3 B 0 0 R APE patt_string patt_string 4 A 0 0 APE AB AB 4 A 0 0 R APE 5 A 0 0 APE j 5 A 0 0 R APE Nov 3 rd , 2006 Dissertation Defense 7 Nov 3 rd , 2006 Dissertation Defense 8
Implementing 1-D and 2-D PE Interconnection Control Network PE and Memory Unit NWIN Control NWOUT � The network is implemented PE and Memory Register Signal Register Instruction as a large 8xN bit wide NWIN Bus PE0 PE0 memory register (where N is the and supporting Responder number of PEs), an 8xN bit Network Resolution circuitry PE1 PE1 Unit NWOUT register Data Bus From Control � Data enters the network Unit PE2 PE2 through the NWIN register, which stores data for PE j in bits from 8j to 8j+7, and then PE and Memory that data is routed to the Common proper place in the NWOUT Registers PE(n-3) PE(n-3) register PE and Memory PE(n-2) PE(n-2) PE Array PE(n-1) PE(n-1) Scalable ASC (Associative Computing) Processor Nov 3 rd , 2006 Dissertation Defense 9 Nov 3 rd , 2006 Dissertation Defense 10 Implementing 1-D and 2-D PE Interconnection Table of Contents Network This version of ASC processor supports both a 1-D and 2-D PE � Chapter 1 – Introduction interconnection network for those applications that require a � Chapter 2 – First Prototypes of an Associative Computing network (ASC) Processor . . . � Chapter 3 – A Scalable Pipelined ASC Processor With - - - , , , Reconfigurable PE Interconnection Network � Chapter 4 – A Specialized ASC Processor with Reconfigurable 2D Mesh for Solving the Longest Common Subsequence (LCS) . - !"# ,-. , Problem !"# !"$ !"% � Chapter 5 – An ASC Processor to Support Multiple Instruction Stream Associative Computing (MASC) . - , !"$ ,-. � Chapter 6 – Conclusions and Future Work !"& !"( !"' !"% ,-. !") !"* !"+ Nov 3 rd , 2006 Dissertation Defense 11 Nov 3 rd , 2006 Dissertation Defense 12
Pipelined ASC Processor with Reconfigurable ASC Processor’s Pipelined Architecture Interconnection Network Control Unit (CU) Parallel PE (PPE) Array � I have implemented a scalable pipelined SIMD Associative (ASC) Processor using Altera FPGAs Instruction Memory � Field Programmable Gate Arrays (FPGAs) are typically used for IF/ID Latch designs and can be thought of as programmable hardware � Five single-clock-cycle pipeline stages are split between the Decoder Immediate SIMD Control Unit (CU) and the PEs Data � In the Control Unit Register File � Instruction Fetch (IF) Broadcast Register � Part of Instruction Decode (ID) ID/EX Latch Data � In the Scalar PE (SPE), in each Parallel PE (PPE) � Rest of Instruction Decode (ID) EX/MEM Latch � Execute (EX) Data Memory � Memory Access (MEM) � Data Write Back (WB) MEM/WB Latch Sequential PE (SPE) Nov 3 rd , 2006 Dissertation Defense 13 Nov 3 rd , 2006 Dissertation Defense 14 Processing Element (PE) Pipelined ASC Processor’s Performance Mask � Our pipelined ASC Processor has been implemented on an Altera APEX20KC1000 FPGA with 70 8-bit PEs Comparator � Other 8-bit processor cores implemented on this FPGA / speed grade have clock speeds ranging from 30 to 106 MHz, typically 60-68 MHz Data Memory MEM/WB Latch EX/MEM Latch Data Switch ID/EX Latch Register File MUX � Our pipelined ASC Processor has a clock speed of 56.4 MHz, comparable with these other processors � With the 5-stage pipeline, our ASC Processor can approach a peak performance of 300 MHz � Comparator implements associative search, pushes ‘1’ onto top of stack for responders, ‘0’ otherwise � Top of mask of ‘0’ disables ID/EX Latch Nov 3 rd , 2006 Dissertation Defense 15 Nov 3 rd , 2006 Dissertation Defense 16
Recommend
More recommend