Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable Architectures Chapter 3.2 Chapter 3.2 Prof. Dr.- -Ing. Jürgen Teich Ing. Jürgen Teich Prof. Dr. Lehrstuhl für Hardware- -Software Software- -Co Co- -Design Design Lehrstuhl für Hardware Reconfigurable Computing
Coarse- -Grained Reconfigurable Devices Grained Reconfigurable Devices Coarse Reconfigurable Computing 2
Recall: Recall: 1. Brief Historically development (Estrin Fix-Plus and Rammig machine) 2. Programmable Logic 1. PALs and PLAs 2. CPLDs 3. FPGAs 1. Technology 2. Architecture by mean of example 1. Actel 2. Xilinx 3. Altera Reconfigurable Computing 3
Once again: General purpose vs Special purpose Once again: General purpose vs Special purpose � With LUTs as function generators, FPGA can be seen as general purpose devices. � Like any general purpose device, they are flexible but inefficient � Flexible because any n-variable Boolean function can be implemented using an n-input LUT. � Inefficient since complex functions must be implemented in many LUTs at different locations.The connection among the LUTs is done using the routing matrix wich increases the signal delays � LUT implementation is usually slower than direct wiring Reconfigurable Computing 4
Once again: General purpose vs Special purpose Once again: General purpose vs Special purpose Example: : Implement the function Example Implement the function F = ABD + AC D + A BC . using 2- -input LUTs input LUTs . using 2 LUTs are grouped in logic blocks (LB). 2 2- -input LUT per LB input LUT per LB LUTs are grouped in logic blocks (LB). 2 2 Connection inside a LB is efficient (direct) Connection inside a LB is efficient (direct) Connection outside LBs are slow (Connection matrix) Connection outside LBs are slow (Connection matrix) A B Connection matrix D A F C D A B C Reconfigurable Computing 5
Once again: General purpose vs Special purpose Once again: General purpose vs Special purpose Idea: Implement frequently used blocks as hard Implement frequently used blocks as hard- -core module in core module in Idea: the device the device A B Connection matrix D A F C D A B C A B F C D Reconfigurable Computing 6
Coarse grained reconfigurable devices Coarse grained reconfigurable devices � Overcome the inefficiency of FPGAs by providing coarse grained functional units (adders, multipliers, integrators, etc.), efficiently implemented � Advantage: Very efficient in terms of speed (no need for connections over connection matrices for basic operators) � Advantage: Direct wiring instead of LUT implementation � A coarse grained device is usually an array of programmable and identical processing elements (PE) capable of executing few operations like addition and multiplication. � Depending on the manufacturer, the functional units communicate via buses or can be directly connected using programmable routing matrices Reconfigurable Computing 7
Coarse grained reconfigurable devices Coarse grained reconfigurable devices � Memory exists between and inside the PEs. � Several other functional units according to the manufacturer. � A PE is usually an 8-bit, 16-bit or 32-bit tiny ALU which can be configured to execute only one operation on a given period (until the next configuration) � Communication among the PEs can be either packet oriented (on buses) or point-to-point (using crossbar switches) � Since each vendor has its own implementation approach, study will be done by means of few examples. Considered are: PACT XPP, Quicksilver ACM, NEC DRP, picoChip, IPflex DAP/DNA. Reconfigurable Computing 8
The PACT XPP – – Overall structure Overall structure The PACT XPP XPP (Extreme Processing Platform) is a hierarchical structure consisting of: � An array of Processing Array Elements (PAE) grouped in clusters called Processing Array (PA) � PAC = Processing Array Cluster (PAC) + Configuration manager (CM) � A hierarchical configuration tree � Local CMs manage the configuration at the PA level � The local CMs access the local configuration memory while the supervisor CM (SCM) accesses external memory and supervises the whole configuration process on the device Reconfigurable Computing 9
The PACT XPP – – The Processing Array Elements The Processing Array Elements The PACT XPP � A Communication Network � Memory elements aside the PACs � A set of I/Os � The PAE: Two types of PAE � The ALU PAE � The RAM PAE � The ALU PAE: � Contains an ALU which can be configured The ALU PAE to perform basic operations � Back-register (BREG) provides routing channels for data and events from bottom to top � Forward Register (FREG) provides routing channels from top to bottom Reconfigurable Computing 10
The PACT XPP - - The Processing Array Elements The Processing Array Elements The PACT XPP � DataFlow Register (DF-REG) can be used at the object outputs for buffering data � Input register can be preloaded by configuration data. � The RAM PAE: 1. Differs from the ALU-PAE only on the function. Instead of an ALU, a RAM-PAE contains a dual-ported RAM. 2. Useful for data storage The RAM PAE 3. Data is written or read after the reading of an address at the RAM-inputs 4. BREG, FREG, and DF-REG of the RAM- PAE have the same function as in the ALU-PAE Reconfigurable Computing 11
The PACT XPP - - Routing Routing The PACT XPP � Routing in PACT XPP: Vertical routing channels � Two independent networks � One for data transmission � The other for event transmission � A Configuration BUS exists besides the data and event networks (very little information exists about the configuration bus) � All objects can be connected to horizontal routing channels using switch-objects � Vertical routing channels are provided by the BREG and FREG � BREGs route from bottom to top Horizontal routing channels � FREGs route from top to bottom Reconfigurable Computing 12
The PACT XPP - - Interface Interface The PACT XPP � Interfaces are available inside the chip � Number and type of interfaces vary from device to device � On the XPP42-A1: � 6 internal interfaces consisting of: � 4 identical general purpose I/O on-chip interfaces (bottom left, upper left, upper Interfaces right, and bottom right) � One configuration manager (not shown in the picture) � One JTAG (Join Test Action Group, "IEEE Standard 1149.1") Boundary scan interface for testing purpose Reconfigurable Computing 13
The PACT XPP - - Interface Interface The PACT XPP � The I/O interfaces can operate independent from each other. Two operation modes � The RAM mode � The streaming mode RAM mode: � Each port can access external Static RAM (SRAM). � Control signals for the SRAM transactions are available. � No additional logic required Reconfigurable Computing 14
The PACT XPP - - Interface Interface The PACT XPP Streaming mode: 1. For high speed streaming of data to and from the device 2. Each I/O element provides two bidirectional ports for data streaming 3. Handshake signals are used for synchronization of data packets to external port Reconfigurable Computing 15
The Quicksilver ACM - - Architecture Architecture The Quicksilver ACM Structure: Fractal-like structure � Hierarchical group of four nodes with full communication among the nodes � 4 lower level nodes are grouped in a higher level node � The lowest level consists of 4 heterogeneous processing nodes � The connection is done in a Matrix Interconnect Network (MIN) � A system controller � Various I/O Reconfigurable Computing 16
The Quicksilver ACM – – The processing node The processing node The Quicksilver ACM An ACM processing node consists of: � An algorithmic engine. It is unique to each node type and defines the operation to perform by the node. � The node memory for data storage at the node level. � A node wrapper which is common to all nodes. It is used to hide the complexity of the heterogeneous architecture. Reconfigurable Computing 17
The Quicksilver ACM – – The processing node The processing node The Quicksilver ACM Four types of nodes exist: � The Programmable Scalar Node (PSN) provides a standard 32-bit RISC architecture with 32-bit general purpose registers � The Adaptive Execution Node (AXN) provides variable size MAC and ALU operations � The Domain Bit Manipulation (DBM) node provides bit manipulation and byte oriented operation � External Memory Controller node provides DDRRAM, SRAM, memory random access DMA ACM PSN-Node control interface Reconfigurable Computing 18
The Quicksilver ACM – – The processing node The processing node The Quicksilver ACM ACM DBM-Node ACM AXN-Node Reconfigurable Computing 19
The Quicksilver ACM – – The processing node The processing node The Quicksilver ACM The node wrapper envelopes the algorithmic engine and presents an identical interface to neighbouring nodes. It features: 1. A MIN interface to support the communication among nodes via the MIN-network 2. A hardware task manager for task management at the node level 3. A DMA engine 4. Dedicated I/O circuitry The ACM Node-Wrapper 5. Memory controllers 6. Data distributors and aggregators Reconfigurable Computing 20
Recommend
More recommend