Introduction to ARM(7) ● ARM Limited, founded 1990 ● Acorn Computers Ltd. (Started in 1983) ● Apple Computer ● 32-bit RISC ● 75% of embedded 32-bit RISC CPUs ● Mobile phones, Calculators, iPod, DS, GBA, ... " ARM720T (ARM7TDMI) 60 MIPS@59.8MHz " Low-power design
ARM(7) Architecture ● Load/Store architecture (+ Multiple) ● 15 32-bit general purpose registers (+ PC) ● Fixed 32-bit instruction length (+16 bit ext.) ● Not pure RISC (?!) ● Some instructions take more than 1 cycle ● No load/branch delay slots
ARM(7) Architecture (cont.) ● 2 level interrupt priority (FIQ, IRQ) ● Extended with VIC (Philips), AIC (Atmel) ● Switched register bank (shadow registers) ● 6 processor modes ● user, sys, svc, fiq, irq, und, abt ● user and sys share all registers
ARM(7) Registers
ARM(7) Registers (cont.) ● General purpose registers (r0-r12) ● Stack register (r13) ● Link register (r14) ● Upon exception r14_<mode> holds PC+8 ● Upon software interrupt r14_<mode> holds PC+4 ● PC (r15) ● PC is special (pipeline is exposed) ● Reading PC might return PC+4 or PC+8 (!)
ARM(7) Registers (cont.)
ARM(7) Registers (cont.) ● Current Program Status Register (CPSR) ● Saved Program Status Register (SPSR_xxx) ● N, Z, C, and V flags ● Interrupt enable flags (FIQ, IRQ) ● Processor mode (User, System etc.) ● Thumb/ARM mode
ARM(7) Instruction Set ● ARM instruction set ● 32-bit length ● All registers accessible ● Load/Store multiple ● Thumb instruction set (subset) ● 16-bit length ● Low registers accessible (r0-r7) ● Reduces code-size by approx. 30% ● Full speed execution with half the memory bandwith
ARM(7) Instruction Set (cont.) ● Data processing instruction in 1 cycle ● mov r0, r1 ● add r0, r1, r2 ● Branch (and link) instructions in 3 cycles ● b/bl label (24 bit offset +/- 32Mbytes) ● mov lr, pc mov pc, r0 ● bx r0
ARM(7) Instruction Set (cont.) ● Data store instruction in 2 cycles ● str r0, [r1] ● Store multiple is 3+n, n is number of registers ● stmia r13!, {r0-r7,lr} ● Data load instruction in 3 cycles ● When PC is destination, 5 cycles ● Load multiple is 4+n, n is number of registers ● Load multiple with PC as destination is 6+n ● ldmdb r13!, {r0-r7,pc}
ARM(7) Instruction Set (cont.) ● Requires correct alignment ● Byte at any address ● Half-word aligned at 2 bytes ● Word aligned at 4 bytes ● Software interrupt ● swi #imm_24 (ARM mode) ● swi #imm_8 (Thumb mode) ● r14 is not PC+8 it's PC+4 (!!!)
ARM(7) Instruction Set (cont.) ● Move to and from CPSR, SPSR ● mrs r0, CPSR ● msr r0, SPSR ● msr r0, CPSR_f ● Co-Processor instructions ● stc/ldc cp0, cr0, [r0] ● mrc/mcr cp0, op, r0, cr0, cr1 ● cdp cp0, op, cr0, cr1
ARM(7) Instruction Set (cont.) ● All instructions are conditional ● Compensate for lack of branch prediction ● if (r0 == r1) r2 = r3[0]; else r2 = r3[1]; ● cmp r0, r1 ldreq r2, [r3] ldrne r2, [r3, #4]
ARM(7) Instruction Set (cont.) ● Expressive syntax, very powerfull indexing ● General shifting operation on one operand ● cmp r2, TBL_SIZE ldrlt r0, [r1, r2, lsl #2] movlt lr, pc movlt pc, r0 ldmfd r13!, {r4-r7,pc}^ ● Extendable instruction set ● Extendable via co-processors ● Emulate co-processors via und (undefined) mode
ARM(7) Save Context // Get user mode sp // Save user mode regs stmfd sp!, {r0} stmfd lr, {r0-r14}^ stmdb sp, {sp}^ nop nop sub lr, lr, #60 ldmdb sp, {r0} // Save SPSR. // Store return address mrs r0, SPSR sub lr, lr, #4 stmfd lr!, {r0} stmfd r0!, {lr} // Save stack pointer // Start using lr ldr r0, =tt_current mov lr, r0 ldr r0, [r0] ldmfd sp!, {r0} str lr, [r0]
ARM(7) Restore Context // Load sp // Restore user mode regs ldr r0, =tt_current ldmfd lr, {r0, r14} ldr r0, [r0] nop ldr lr, [r0] add lr, lr, #60 // Restore SPSR // Return from interrupt ldmfd lr!, {r0} ldmfd lr, {pc}^ msr r0, SPSR
MIPS Context Save // Save user regs. // Save sp subu $29, $29, 116 la $1, tt_current sw $1, 0($29) lw $1, 0($1) . nop . sw $29, 0($1) . nop sw $31, 108($29) // Save return address mfc0 $26, $14 nop sw $26, 112($29)
MIPS Context Restore // Load sp // Restore return addr la $29, tt_current lw $26, 112($29) lw $29, 0($29) addu $29, $29, 116 nop lw $29, 0($29) // Return from interrupt nop jr $26 rfe // Restore user regs lw $1, 0($29) . . . lw $31, 108($29)
ARM(7) Pipeline ● 3-stage pipeline ● Fetch ● Decode ● Execute ● No branch/load delay slots ● Pipeline is stalled/utilized ● Simple, no register forwarding etc.
ARM(7) Pipeline
ARM(7) Pipeline (ADD)
ARM(7) Pipeline (ADD)
ARM(7) Pipeline (STR)
ARM(7) Pipeline (STR1)
ARM(7) Pipeline (STR2)
ARM(7) System Bus ● AMBA (Advanced Microcontroller Bus Arch.) ● ASB (Advanced System Bus) ● High preformance ● System modules (on chip RAM etc.) ● Burst mode data transfers ● APB (Advanced Peripheral Bus) ● Simpler, slower (usually at half the speed of ASB) ● Usually slave module on ASB ● Peripherial devices ● UART ● Timer
ARM(7) Too Slow? ● ARM9 (ARM920T) ● 5-stage pipeline ● 16kiB/16kiB I-Cache/D-Cache ● MMU (TLB) ● 200MIPS@180MHz ● XScale, ARM11, Cortex ● 13-stage pipeline (Cortex) ● Up to 2000 MIPS at 1GHz (Cortex-A8)
ARM(7,9,11), Cortex, XScale ● ARM Limited produce the standard ● Manufactured by third party ● Philips (ARM7/9) ● Atmel (ARM7/9) ● Cirrus Logic (ARM7/9) ● STMicroelectronics (ARM7/9, Cortex) ● Actel FPGA (ARM7) ● ...
Recommend
More recommend