Graduate Institute of Electronics Engineering, NTU Blackfin Processor Architecture Processor Architecture Blackfin Instructor: Prof. Andy Wu ACCESS IC LAB
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Berkeley incorporated a Reduced Instruction Set Computer (RISC) architecture It has the following key features: A fixed (32 - bit) instruction size with few formats CISC processors typically had variable length instruction sets with many formats A load store architecture were instructions that process data operate only on registers and are separate from instructions that access memory CISC processors typically allowed values in memory to be used as operands in data processing instructions A large register bank of thirty - two 32 - bit registers, all of which could be used for any purpose, to allow the load - store architecture to operate efficiently CISC register sets were getting larger, but none was this large and most had different registers for different purposes
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Hard-wired instruction decode logic CISC processor used large microcode ROMs to decode their instruc tions Pipelined execution CISC processors allowed little, if any, overlap between consecut ive instructions (though they do now) Single-cycle execution CISC processors typically took many clock cycles to completes a single instruction
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Single memory space for program and data Shared global bus
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Separate program and data memory spaces Usually refer to separate program and data buses
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Program bus can be use for coefficient loading for MAC
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Made by Analog Devices Coporation A new breed of embedded media processor designed specifically for today s embedded audio, video and communication applications. Combine a 32-bit RISC-like instruction set and dual 16-bit multiply accumulate (MAC) signal processing functionality Perform equally well both in signal processing and control processing applications-in many cases deleting the requirement for separate heterogeneous processors
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Two 16-bit MACs, two 40-bit ALUs, four 8-bit Video ALUs Support for 8/16/32-bit integer and 16/32-bit fractional data types Concurrent Fetch of One instruction and two unique data elements Two loop counters that allow for nested zero-overhead looping A Modified Harvard architecture in combinational with a hierarchical memory
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Arbitrary bit and bit field manipulation, insertion and extraction Two data address generator (DAG) units with circular and bit-reversed addressing Data address generator contains two 32 - bit address ALUs and an address register file Address register file consists of six 32 - bit general purpose pointer registers and four 32 - bit circular buffer addressing registers
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Unified 4GB memory space Mixed 16/32-bit instruction encoding for best code density Memory protection for support of OS operation
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Three modes of operation User mode User mode has restricted access to a subset of system resources, thus providing a protected software environment User mode is considered the domain of application programs Supervisor mode and Emulation mode Supervisor mode and Emulation mode have unrestricted access to t he core resources Supervisor mode and Emulation mode are usually reserved for the kernel code of an operating system
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Blackfin Architecture Support Architecture Support Blackfin (Single Cycle ) (Single Cycle ) Possibility of the following parallel operations processed in one clock cycle Execution of a single instruction operating on both MACs or ALUs Execution of a 2 x 32 - bit data moves 2 reads or 1 read/1 write Execution of two pointer updates Execution of hardware loop updates
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Blackfin Processor Compute Unit Processor Compute Unit Blackfin
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU BF533 Memory Access BF533 Memory Access Under the right conditions 4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bi t Data Loads, 32 bit Data Store PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the sam e time PLUS background DMA activity
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Compute Unit Architecture Compute Unit Architecture
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Register File Register File Data Register Syntax R0, R1 etc. refer to 32 bit registers R0.L refers to the low 16 bits of the R0 32 bit reg R0.H refers to the high 16 bits of the R0 register 8 x 32 bit Accumulator Syntax A0.L => low 16 bits OR A0.H => next 16 bits 16 x 16 bit A0.W => least significant 32 bit word A0.X => MS 8 bit extension SHARC 16 32-bit data registers, 2 x 40 bit integer and float. There is a pair of accumulators SHARC accumulator registers too
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Blackfin SHARC 68K R0 = R1 + R2; R0 = R1 + R2; MOVE.L R2, R0 ADD.L R1, R0 R0.L = R1.L + R2.H; MOVE.W R2, R0 ADD.W R1, R0 Closest MOVE.L R2, R0 R0 = R1 +| - R2; R0 = R1 + R2, ASR.L #16, R0 Means R4 = R1 R2; MOVE.L R1, R3 ASR.L #16, R3 R0.L = R1.L R2.L ADD.W R3, R0 in parallel with ASL.L #16, R0 R0.H = R1.H + R2.H MOVE.W R2, R0 ADD.W R1, R0
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU A & B registers must stay on the same side of the | for both Instruction For dual and quad 16 bit operations the (CO) option causes the destination registers to cross
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multiplies are signed fractional by default Signed fractional multiply result is automatically left shifted 1 bit Signed fractional multiply != signed integer multiply Rounding available on fractional number multiplies and special option of integer number multiplies
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Two cases Rounding adds 0x8000 to the 32 bit multiplier result or accumula tor value before extracting a 16 bit value to the destination register too
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken Where in the destination register it goes depends on which accumulator is being extracted from
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU When extracting a 16 bit integer value from an accumulator the low 16 bits is taken Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU In general there are 16 and 32 bit versions of the arithmetic instructions Most of the 32 bit instructions can be executed in parallel with 2 x 16 bit memory/index operations Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands || means parallel Examples: A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3; \ R2=R2+|+R4, R4=R2 -|- R4 || I0+=M0||R1=[I0];
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Blackfin Processor Processor Blackfin Memory Architecture Memory Architecture
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU A single, unified 4G byte address space using 32-bit addresses The L1 memory system is the primary highest performance memory available to the core and is faster than L2 memory system The L2 memory system is off-chip and have longer access latencies
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Blackfin Processor Peripherals Processor Peripherals Blackfin
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Parallel Peripheral Interface (PPI) Serial Ports (SPORTs) Serial Peripheral Interface (SPI) General-purpose timers Universal Asynchronous Receiver Transmitter (UART) Real-Time Clock (RTC) Watchdog timer General-purpose I/O (programmable flags)
Recommend
More recommend