introduction cs2253
play

Introduction CS2253 Goal: write a simple C program and understand - PowerPoint PPT Presentation

Introduction CS2253 Goal: write a simple C program and understand Why and what for 2253 how the computer actually executes it. This year, we study the ARM7TDMI processor. Levels of abstraction Last year, we used the fictional


  1. Why 2's Complement? 2's Complement Tricks ● There is only one representation of 0. (Other ● +ve numbers are exactly as in unsigned binary representations have -0 and +0.) ● Given a 2's complement number X (where X may be -ve, +ve or zero), compute -X using the twos ● To add 2's complement numbers, you use complementation algorithm (“flip and increment”) exactly the same steps as unsigned binary. ● Flip all bits (0s become 1s, 1s become zeros) ● There is still a “sign bit” - easy to spot negative ● Add 1, using the unsigned binary algorithm numbers ● Ex: 00101 = +5 In 5 bit 2's complement ● You get one more number (but it's -ve) 11010 + 1 → 11011 is -5 in 2's complement Range of N bits 2's complement: -2 N-1 to +2 N-1 -1 ● And Flip(-5)=00100. 00100+1 back to +5

  2. Converting a 2's complement Sign extension number X to decimal ● Determine whether X is -ve (inspect sign bit) ● Recall zero-extension is to slap extra leading zeros onto a number. ● If so, use the flip-and-increment to compute - X ● Eg: 5 bit 2's compl. to 7 bit: 10101 → 0010101 Pretend you have unsigned binary. Oops: -11 turns into +21. Zero extension didn't preserve numeric value. Slap a negative sign in front. ● The sign-extension operation is to slap extra copies ● If number is +ve, just treat it like unsigned of the leading bit onto a number binary. ● +ve numbers are just zero extended ● But for -11: 10101 → 1110101 (stays -11)

  3. Overflow for 2's complement Numbering Bits ● Although addition algorithm is same for fixed-width ● On paper, we often write bit positions above the actual data bits. unsigned, the conditions under which overflow occurs are ● 543210 ← normally in a smaller font than this different. 001010 bits 3 and 1 are ones. ● Sometimes we like to write bits left to right, and other times, right to ● If A and B are both same sign (eg, both +ve), then if A+B is left (which is more number-ish). We usually start numbering at zero. the opposite sign, something bad happened (overflow) ● Inside computer, how we choose to draw on paper is irrelevant. ● Overflow always causes this. And if this does not happen, ● Computer architecture defines the word size (usually 32 or 64). there is no overflow. Usually viewed as the largest fixed-width size that the computer can ● Eg, 1001 + 1001 →0010 but -7 + -7 isn't +2. handle, at maximum speed, with most math operations. ● So bit positions would be numbered 0 to 31 for a 32-bit architecture. Note that -14 cannot be encoded in 4-bit 2's complement.

  4. More Arithmetic in 2's complement Bit Vectors (aka Bitvectors) ● Sometimes we like to view a sequence of bits as ● Subtract : To calculate A-B , you can use A + (-B) an array (of Booleans) Most CPUs have a subtract operation to do this for you. ● Multiplication : easiest in unsigned. (Most CPUs have instr.) ● Eg hasOfficeHours[ x ] for 1 <= x <= 31 ● D.I.Y. unsigned multiplication is like Grade 3: says whether I hold office hours on the x th of this But your times table is the Boolean AND !! month. The product of 2 N -bit numbers may need 2N bits ● And isTuesday[ x ] says whether the x th is a ● For 2's complement, the 2 inputs' signs determine the product's Tuesday. sign. eg, -ve * -ve → +ve ● So what if you want to find a Tuesday when I hold ● And you can multiply the positive versions of the two numbers. Finally, correct for sign. office hours?

  5. Bitwise Operations for Bit Vectors Find First Set ● Bitwise AND of B1 and B2: ● Some ISAs have a Find First Set instruction. (You've got a bitvector marking the Tuesdays Bit k of the result is 1 iff bit k of both B1 and B2 is 1. ● Java supports bitwise operations on longs, ints when I have office hours – but now you want to int b1 = 6, b2 = 12; // 0b110, 0b1100 find the first such day.) int result = b1 & b2; // = 4 or 0b100 ● Integer.numberOfTrailingZeros() in Java ● Bitwise NOT (~ in Java) achieves this. ● Bitwise OR ( | in Java) ● So use ● Bitwise Exclusive Or ( ^ in Java) Also write “XOR” or “EOR”. ● Pretty well every ISA will support these operations directly. Integer.numberOfTrailingZeros(hasOfficeHours & isTuesday)

  6. Bit Masking Bit Masking with AND ● Think about painting and masking tape. You can put a ● AND(x,0) = 0 for both Boolean values of x ● AND(x,1) = x for both Boolean values of x piece of tape on an object, paint it, then peel off the tape. Area under the tape has been protected from painting. ● bitwise AND wants to paint bits 0, except where the mask protects (1 ● We can do the same when we want to “paint” a bit vector protects) with zeros, except in certain positions. ● hasOfficeHours & 0b1111111111 ● Eg, I decide to cancel my office hours except for the first is a bitvector that keeps my office hours for the first 10 days (only). Later in month, all days are painted false. 10 days of the month. ● hasOfficeHours &= 0b1111111111 modifies hasOfficeHours. By analogy to ● Or we can protect positions against painting with ones. the += operator you may already love. ● The value 0b111111111 is being used as a mask. ● Details next... ● Quiz: what does hasOfficeHours & ~0b1111111111 do?

  7. Bit Masking with OR Bit Masking with EOR (aka XOR) ● OR(x,1) = 1 for both Boolean values x ● EOR(0,x) = x for both Boolean values x ● OR(x,0) = x for both Boolean values x ● EOR(1,x) = NOT(x) for both Boolean values x ● bitwise OR wants to paint bits with 1s, except where the ● bitwise EOR wants to flip bits in positions that are not mask prevents it. A 0 prevents painting. protected with a 0 in the mask. ● hasOfficeHours | 0b1111111111 is a bitvector where I ● hasOfficeHours ^= 0b111100 have made sure to hold office hours on each of the first inverts my office hour situation for Jan 3-6. 10 days (and left things alone for the rest of the month) ● Bit masking with EOR is less common than OR and ● hasOfficeHours |= 0b1111111111 makes it permanent. AND. ● Quiz: what would hasOfficeHours |= 0b101 do?

  8. Example: Is a Number Odd? Example: Multiple of 8? ● A binary value is a multiple of 8 (=2 3 ) iff it ends with ● Fact: A number is odd iff its least significant (i.e., rightmost) bit is 1. 000. ● Related to the fact that a decimal number is a multiple ● Java: of 1000 (= 10 3 ) iff it ends with 000. if ( (myNum & 0b1) == 0b1) ● Java: System.out.println(“Very odd number”); if ( (myVal & 0b111) == 0) ● Note: decreasing precedence System.out.println(“multiple of 8”); &&, ||, |, ^, &, == ● Fact: a more general rule is that the rightmost k bits of Even if you don't have to, maybe parenthesizing is a good idea. X are ( X mod 2 k ) (not certain about -ve numbers) It's hard to remember weird operators' precedence levels.

  9. Bit Shifting Dynamically Generating Masks ● A bunch of operations let the element of a ● Shifts are useful for dynamically generating masks for use with bitwise AND, OR, EOR. bitvector play “musical chairs”. ● The Hamming weight of a bunch of bits is the number of bits ● logical left shift: every bit slides one position that are 1. (After Richard Hamming, 1915-1998.) Many modern CPUs have a special “population count” instruction to compute left. The old leftmost bit is lost. The new Hamming weight. Except for speed, it is not needed: rightmost bit is 0. Java << operator repeats int hWeight=0; // Hamming weight of int value x this to shift the value several positions. for (int bitPos=0; bitPos < 32; ++bitPos) { ● Eg, 0b11 << 4 is same as 0b110000. int myMask = 0b1 << bitPos; if ( x & myMask != 0) ● logical right shift: similar, Java >>> operator. ++hWeight;

  10. Poor Man's Multiplication Poor Man's Division ● What happens if you take a decimal number and slap 3 ● So then, does shifting bits to the right then correspond to division by powers of 2? zeros on the right? It's same as multiplying by 10 3 . ● For unsigned, yes. (Throwing away remainders). ● Similarly, X << 3 is same as X * 8. Even works if X is -ve. (Unless X*8 overflows or underflows) ● For 2's complement +ve numbers, yes. ● Poor man's X*10 is (X<<3)+(X<<1) ● -ve numbers: no. Regular right shift inserts zeros at the leftmost position (the sign bit). since it equals (8*X + 2*X) ● 11111000 → 01111100 means -8 → +124 ● Compilers routinely optimize multiplications by some ● A modified form, arithmetic right shift inserts copies of the sign constants like this, since multiplication is often a harder bit at the leftmost. Java operator >> vs >>> operation than shifting and adding. Called strength ● 11111000 → 11111100 means -8 → -4 as desired reduction.

  11. Division by Constant, via Example: Divide x by 17 Multiplication and Right Shifting ● Low-end CPUs may not have an integer divide instruction but ● We get to choose p. Say p=2 8. may have a multiply. Want to divide by a constant y that is not a ● 256/17 = 15.05 is close to 15. power of 2. ● Compute (x*15) >> 8 to approximate x/17. ● Mathematically, x/y = x * 1/y ● Test run for x=175. (175*15)/256 = 10 (throwing away remainder). Good. ● Multiply 1/y by p/p , for p being some power of 2. Say p = 2 k . So ● Test run for x=170. (170*10)/256 = 9 (because we throw away a remainder of .996). Oops! x/y = ( x * (p/y)) / p . ● Can be improved, but will never be perfect. Still, maybe an approximate ● p/y is a constant that you can compute. Division by p is a right answer is okay. shift. ● Closer approximations by using bigger values of p. ● Considerations: effect of truncations and whether the ● Using 32-bit integers, what is the biggest number we can divide by 17 this multiplication overflows. way, without getting overflow?

  12. General-Purpose Mult. and Div. More Bit Shuffling ● What if you want to multiply and divide by a ● Most CPUs support more exotic ways for bits to play musical chairs. No operator like >> in Java or C for this, variable? though: ● Today, most CPUs come with instructions to do ● Left rotation by 1 position: this, except maybe the kind in your digital toaster. – every bit but leftmost moves left 1. The leftmost bit circles ● But you can always implement * by the Grade-3 around and becomes the new rightmost bit. shift-and-add algorithm. Or repeated addition. ● Left rotation by >1 positions is same result as doing multiple left rotations by 1. ● Division: see how many times you can subtract y ● A right rotation by 1, or by >1 positions, also exists. from x (in a loop). Or (harder), implement the algorithm you learned in Grade 3. ● Example: 1010011 right rotated by 1 is 1101001

  13. Hacker's Delight Confessions ● Henry Warren's book, Hacker's Delight, belongs on ● A simple arithmetic right shift of a negative number is not quite the same as division by a power of 2. (Sometimes the bookshelves of serious low-level programmers. you can be off by one; two extra instructions can adjust for ● It is a collection of neat bit tricks collected over the this.) years. It is the source of much of the ● A detailed analysis of the divide-by-a-constant approach implementation of Java's class Integer. (eg the divide-by-17 example) can avoid the small errors. ● Course website has a link to a web page with a People have worked out approaches for dividing exactly by 3, 5, 7 using multiplications and shifts…. similar collection of “bit hacks”. ● Chapter 10 of Hacker’s Delight is “Integer Division by ● Despite the title, this book is not about breaking Constants” and is 72 pages that are quite mathematical. security...it's the older, honourable use of “hacker”. Also the word “magic” appears many times.

  14. Character Data Control Codes ● Characters are encoded into binary. One historical ● Control codes are often invisible when printed, method that is still used is a 7-bit code, American and some text editors won't show them. But Standard Code for Information Interchange. software (e.g. compilers) can be thrown off by ● ASCII contains upper and lower-case letters, them. Leads to puzzled students sometimes. punctuation marks and digits that would commonly have ● A common convention is to discuss control codes been needed for US English data processing needs. by using a letter preceded by ^. Eg, ^C. On ● ASCII also encodes other things that control the many keyboards, pressing the Ctrl key at the assumed teletypewriter machine: these control codes same time as the letter can generate a control include carriage return, line feed, tab, ring the bell, end code. of file, …

  15. Backslash Escapes Unicode ● Many programming languages have have ● Many (non US people) found ASCII to limiting so attempts were first made to extend it to other Western European character sets. backslash escapes to represent some popular ● Unicode seeks to represent all current and historical symbols in all control codes. cultures and languages. Original idea was that 16 bits was enough. ● Eg, '\t', '\n', '\r' in Java and C. You type \t as two Java uses this early Unicode idea, so char in Java is 16 bits. ● Unicode version 9.0 (2016) has >100k characters plus many more characters, but it represents a single character (a symbols. 16 bits is not enough. tab, ASCII code 9, ^I) ● Each Unicode character is represented by a numeric code point. ● '\123' represents a single byte whose ASCII code First 128 of them correspond to ASCII, for backwards compatibility. is 123 in octal (base 8 – more on this later) (In First 2 16 are the Basic Multilingual Plane. ● There are several ways of encoding code points into bytes. many programming languages)

  16. UTF-8, UTF-16 etc Strings ● UTF-32 uses 32 bits to store a code point. It is a fixed-width encoding: if I ● A string is a sequence of characters. In Java it's represented by know how many characters I need to store, I know precisely how many a String object, as you know. bytes it will cost me. ● In lower-level programming (C, assembler), a string is more likely ● But UTF-32 wastes bytes. Characters outside the Basic Multilingual Plane viewed as a sequence of consecutive memory locations that are rare. Codepoints from 0-127 (“ASCII”) are very common. store the successive characters in the string. ● UTF-16 represents codes in the BMP with 2 bytes. Weird codes outside ● Q: how do you know when the next memory location doesn't need 2 more . Not a fixed-width encoding. store the next character in a string (how to know a string is over)? ● UTF-8 represents ASCII codes in 1 byte, other BMP codepoints with 2 or 3 bytes, and weird codes with 4. ● Common convention: null-terminated string. A string always ends ● UTF-8 is fully backward compatible with old ASCII files. with a character whose ASCII code is 0. “C-style string” ● In Java, the constructor for FileOutputStream has an optional parameter ● ARM assembly language: if you want a C-style string, you have that can be “UTF-8” or “UTF-16” etc. Otherwise, it uses the operating to put the null at the end. Fun bugs if you forget. system's default.

  17. Representing Fractional Values Fractional Values, 2. ● You can represent fractional values using a fixed- ● We can store all numbers by shifting the binary point right 3 (for example). So we are measuring everything in eighths. point convention. In decimal and for money (unit 5.375 10 is then stored as 101011, instead of 101.011. dollars), an example would be agreeing to store ● Can add and subtract fixed-point numbers successfully, as each values as a whole number of cents. long as each is, for instance, measured in eighths. ● So 2.35 is stored as 235. We have shifted the ● But multiplying two numbers given in eighths results in a decimal point by a fixed amount, two positions. product that is measured in 64ths. So have to divide by 8 ● In unsigned binary 101.011 means 101 2 and a (just shift right 3 positions...) fraction of 0*2 -1 + 1*2 -2 + 1*2 -3 . I.e., 5.375 ● Advantage: fractions handled using only integer arithmetic. .

  18. Floats, Doubles etc. IEEE-754 Standard ● Scientific processes generate huge and tiny numbers. No ● IEEE-754 is the standard way to represent a binary single fixed-point shift will suit everything. floating point value in 16 (half-precision), 32 (single ● Measured values have limited precision - no sense to store precision), 64 (double precision), 128 (quad the number of meters to Alpha Centauri as an integer. precision) or 256 (octuple precision) bits. ● Floating-point representation is a computer version of the ● 32-bit form available in C & Java as float; 64-bit “scientific notation” you learned in school, eg: 3.456 x 10 -5 form as double. ● +3.456 is the significand or mantissa. We've 4 sig. digits ● Overall, it's a sign-magnitude scheme. ● Number is normalized to 1 significant digit before decimal ● But exponent is signed quantity using the biased point. ● The exponent is -5, and the sign is positive. approach.

  19. IEEE-754 Floats Example ● 1 sign bit, S . (Bit position 31) ● Find the numerical value of a float with bits ● next, 8 exponent bits with binary value E. 0 00111100 001100000000000000000000 ● Use formula (-1) S * 1.ffff...fff * 2 ( E-b ) ● Exponent bias b of 127. ● S=0. E=00111100 2 = 60 10 . b=127 (always) ● 23 fraction bits fff..fff to represent the significand ffff...fff = 00110000000000000000 of 1.ffff..fff. Note “hidden” or “implicit” leading 1. ● So: -1 0 * 1.0011 2 * 2 -67 ● Formula for “normal” floats: = +1 * (1 + 3/16) * 2 -67 Value = (-1) S * 1.ffff...fff * 2 ( E-b ) ● It's a small positive number. Calculator for details.

  20. Example 2: Determine the bits Representable Values ● Determine how to represent -2253.2017. ● There are/is an uncountably infinity of real numbers. ● There are at most 2 32 different bit patterns for a float. No bit pattern ● Helpful facts: 2253 = 2048 + 4 + 1. represents more than one real number. ● Therefore, there are real numbers that cannot be represented. ● 0.2017 * 2 24 = 3383964 + fraction (Overwhelming majority.) ● 3383964 10 = 1100111010001010011100 2 ● For any given exponent value, there are only 2 23 different mantissas, 1.000... to 1.111... ● Now let's put the pieces together. ● No number whose exponent exceeds 255-127 ● No number whose exponent < (0 – 127). ● (Though in CS3813 you'll learn about subnormal numbers, so I've lied; IEEE-754 is a fair bit hairier than I've presented.)

  21. Example IEEE-754 Doubles ● What is the next representable value, after ● 64 bits, divided up into 5/16? – 1 sign bit ● 5/16 = 0.0101 * 2 0 = 1.0100...0 * 2 -2 – 11 exponent bits, bias 1023 ● Now let's reason. – 53 fraction digits ● More exponent bits: better range of numbers ● More fraction bits: smaller gaps between representable numbers (higher precision ). ● Otherwise, like Float.

  22. Machine Instructions Hexadecimal ● A decimal number has about 1/3 as many digits as the ● Another thing that becomes binary: machine instructions. ● Typical m/c instruction has corresponding binary number: small base, lots of digits. – an operation code (opcode) that indicates which of the supported operations is ● Humans do poorly with many-digit numbers. desired ● So for humans, it is handy to work in larger bases. But it's – codes indicating addressing modes that provide the input data (“operands”) hard to convert base 10 numbers into base 2 numbers. – code indicating where the result should be put – code indicating the conditions under which the instruction should be ignored ● Base 16, or hexadecimal (hex), is the go-to base for ● An instruction-format specification helps you determine how to machine-level human programmers. assemble these codes into a machine-code instruction. – numbers have few digits ● Chapter 1: To store the constant 8 into a register variable: 0b11100011101000001101000000001000 in ARM m/c code. – it's easy to convert to/from binary ● We'll study ARM instruction formats later.

  23. Hexadecimal digits Converting Hex to Binary ● Because 16=2 4 , each hex digit expands to 4 ● Whereas decimal uses digits 0 to 9, hexadecimal uses 0-9,A,B,C,D,E,F. binary digits. – Digit 7 has value seven, just like decimal ● For 0x9A4 – Digit A has value ten, B has value eleven, … F has value fifteen – the 9 expands to 1001 ● Numbers have a ones place, a sixteens place, a 16 2 s place, a 16 3 s – the A expands to 1010 place, etc. ● 2F32 means 2*16 3 + 15*16 2 + 3*16 1 +2 – the 4 expands to 0100 ● In many languages, you prefix hex constants with 0x ● So 0x9A4 expands to 0b100110100100 so int fred = 0x2f32; // works fine in Java. int george = 0x100; // equivalent to george = 256;

  24. Converting Binary to Hex Small Negative Numbers ● Binary → Hex is the reverse process. ● We usually use unsigned hex to reflect bit patterns, even if they meant to be 2's complement numbers. ● Only trick: you want the binary number to be the correct ● So what does a 32-bit negative number look like, if it is length ( a multiple of 4 in length) pretty close to 0? ● So zero extend it, if necessary ● The corresponding bit pattern has a lot of leading ones. ● Then each group of 4 bits collapses to a hex digit. When converted to hex, each group of 4 ones turns into ● 101010 → 0010 1010 → 2A an F digit. ● Rather than count bits and zero extend first, just circle ● So your not-very-negative number has lots of leading F's. groups of 4 bits starting from the right. If the last ● 0xFFFF FFF3 is the bit pattern for -5 10 = 0b111...111011 group has fewer than 4 bits, it's okay.

  25. Hex Arithmetic Example: Hex Addition ● It's sometimes handy to addition and subtraction of ● A debugger reports that an item begins in memory at address 0x1234. You know its size is 0x7D. What is the hex numbers without converting to decimal. first address after the item? ● (typically, subtraction when you want to figure out ● 1234 the size of something in memory, and you've got the + 7D starting and ending positions) ● 4 is worth 4, D is worth 13. Sum to 17, or 0x11 ● Like Grade 3, except your addition/subtraction table ● Keep the 1, carry the 16 to the next stage is bigger. ● 3 and 7 sum to 10. But there is a carry, bumping you to 11, – don't memorize: just use the values of digits or 0xB. No carry to next stage. – you carry and borrow 16, not 10 ● So 1234 + 7D = 12B1.

  26. Example: Hex Subtraction Octal (base 8) ● In bygone days, octal (base 8) was an alternative to ● 1203 hexadecimal. -0F15 ● Since 3 < 5, borrow from 0x20 (making it 0x1F). ● Conversion to/from binary is by grouping bits into groups of ● You borrowed 16, so 3 is now worth 16+3=19. size 3, but otherwise same as hex. ● Take away 5, get 14. Hex digit for 14 is E. ● Octal survives in some niches. In a string or a character, a ● F-1 is E (no borrow needed). backslash can be followed by 3 octal digits (typically the ● 1-F needs a borrow, makes 1 worth 16+1=17. ASCII code of some otherwise unprintable character). ● Take away F (value 15), get 2. (hex digit is 2). ● In Java and C, any digit string that starts with a leading zero ● You borrowed from the 1, so its 0. 0-0=0. is assumed to be octa l. Remaining digits must be 0 to 7. ● You could write down this leading zero, if you wanted.... ● So: int fred = 09; // mysterious compile error ● 1203-0F15 = 02EE

  27. ARM v4T ● History of ARM processors ● R is for RISC ARM v4T ● Registers CS2253 ● Status flags and conditional execution Owen Kaser, UNBSJ ● Memory ● Example program

  28. History of ARM v4T History of ARM v4T, cont. ● Acorn Computers in the UK, early 1980s ● The ISA has been added to over the years. ARM v4T dates from early 1990s. ● Designed own CPU for a line of PCs, based on cutting-edge ● Actually, v4T has the regular 32-bit ARM ISA and a design trends then. ● Cutting edge was RISC: Reduced Instruction Set Computers. simpler Thumb ISA, where instructions can be 16 bits long. We ignore Thumb in CS2253. ● ARM was the Acorn RISC Machine ● New versions of the ISA have come out in the meantime ● Circa 1990, retitled Advanced RISC Machine and the design (though old are still being produced). was licensed to other companies to manufacture or add extra components, as part of a System-on-a-Chip. ● ISAs that evolve tend to get ugly, preserving backwards ● Like the extra stuff to make an Apple Newton, an iPod, a compatibility. There is now a 64-bit ISA that apparently is once again clean. Maybe we can shift 2253 to it in future. Nokia phone...

  29. ARM is Popular What's RISC? ● ARM variations are the champion in popularity ● The R in ARM stands for Reduced Instruction Set Computer. for mobile devices. ● By 2002, there were 1.3 billion manufactured – in contrast to the extremely complicated CPUs of the late 1970s (VAX had an “evaluate polynomial” ● In just 2012, 8.7 billion were manufactured. instruction, for instance) A “CISC” machine has some advantages, in “code density”. – complex means expensive to make, and hard to make run fast. – RISC tried to simplify ISAs, so implementation can be simple and fast.

  30. RISC Principles ARM v4T Components ● There should be a small number of instructions. ● There are 15 main registers, R0 to R14. Each can store any 32-bit value. R13 and R14 are a tad special. ● Every instruction should do something very simple, so it can run in 1 clock cycle. ● As a first approximation, a HLL programmer can view ● All machine codes should be the same length (32 bits). them as the only real “variables” you have. ● There should be relatively few different machine code ● R15 is also called PC (Program Counter) and keeps formats. track of where to fetch instructions. ● Should be a fair number of storage registers, and most ● Due to “pipelining”, when an instruction executes, PC operations should involve only them. actually stores the address of the instruction that is 8 ● Values should be transferred between RAM and registers by bytes ahead. Pipelining is an advanced CS3813 topic. explicit Load and Store instructions.

  31. Example Instructions ARM Components: CPSR ● Add two register values, result in 3 rd register. ● The Current Program Status Register is a collection of 12 miscellaneous bits. ● Exclusive-OR two register values, results in 3 rd . – 4 keep track of how recent instructions went (“status flags”) ● Change the program counter (subtract 16 from it) – 8 allow you to see and control the processor configuration ● Get a halfword from memory, at an address that is 10 more (“control bits”). We don't need them initially. than the current value of R1. Sign extend it and put it in R2. ● Chapter 2 of the textbook tells you about other Modify R1 to be increased by 10. advanced concepts that aren't needed until the hardest ● Store the first byte in R1 into memory, address obtained by parts of the course, much later. taking R2 and shifting it left 2, then adding that value to R3. ● Please ignore anything about “processor modes” other In each case, the technical ARM documentation can tell you than User, for now. how the instruction would be encoded into bits.

  32. Status Flags Conditional Execution ● Most ISAs (except the MIPS ISAs we often study in ● Most ARM instructions can be made conditional, so they CS3813) use status flags. do nothing unless the specified status flags are set. ● They help record the outcome of an earlier instruction, ● Example: 64-bit counter. so that your program can do different things, depending – First instruction sets flags while incrementing the low-order 32 on what happened earlier. bits ● Flags are N (bit 31 was 1), Z (all bits were 0), V (result – Second instruction runs conditionally and only increments the high-order 32 bits if the Z flag is set oVerflowed), and C (there was a Carry out) – Maybe low-order bits in R1 and high-order in R2 ● Many instructions have a version that updates the flags ● Non-ARM ISAs generally have only a few conditional and another that doesn't. But some instructions always instructions (the ones that implement IF) update the flags.

  33. Constants Memory ● Many ARM operations can use constants (just ● The ARM processor is byte addressed, in that every byte of memory has its own address, starting from address 0. like you can add two registers together, you can ● Addresses are 32 bits long, leading to a maximum of 4GB of add a register to a constant, etc.) memory (at least for a given running program). [But note that ● ARM constants are weird. Numbers -128 to some addresses are typically carved out for non-memory.] 255 are okay, as are a few larger numbers ● Special Load and Store instructions are used to access memory. You can transfer 1, 2 or 4 bytes in one operation. ● Allowable larger numbers are those obtained by ● In ARMv4, 4-byte transfer must begin at a memory address rotations of 0-255 by an even number of that is a multiple of 4: the alignment rule. Similarly, 2-byte positions, etc. More later. transfers must begin at an even address.

  34. Big Endian vs Little Endian Example Program ● When a 4-byte word is laid out in memory, does the ● Compute 10+9+8+7+6+5+4+3+2+1 most-significant byte (big end) come first, or the – Put the constant 0 into R1 least-significant byte (little end)? – Put the constant 10 into R2 ● A religious war arose between the two camps. – Add R1 and R2, put the result into R1 – Subtract the constant 1 from R2 and set the status flags ● ARM7TDMI processor can do either, but the default – If the Z flag is not set, reset the PC to contain the address of the for ARM is usually little-endian. 3 rd instruction above. ● The issue is only visible if you write a word/halfword ● Each of these instructions can be encoded into machine into memory and then try to read it back in smaller code, if you are willing to slog through the reference pieces (eg bytes). manuals enough.

  35. Assembly Language Assembly Language ● Some insane machine-code programming CS2253 ● Assembly language as an alternative Owen Kaser, UNBSJ ● Assembler directives ● Mnemonics for instructions

  36. Machine-Code Programming (or, Put 0 into R1 Why Assemblers Keep Us Sane) ● Compute 10+9+8+7+6+5+4+3+2+1 ● There's a Move instruction, or you could subtract a register from itself, or EOR a register with itself, or... let's use Move. – Put the constant 0 into R1 ● Book Fig 1.12 – Put the constant 10 into R2 ● – Add R1 and R2, put the result into R1 ● cond = 1110 means unconditional – Subtract the constant 1 from R2 and set the status flags ● S=0 means don't affect status flags ● I=1 means constant; opcode = 1101 for Move – If the Z flag is not set, reset the PC to contain the address of the 3 rd instruction above. ● Rn = ???? say 0000; Rd = 0001 for R1 ● Let's try to make some machine code. ● bits 8-11: 0000 Rotate RIGHT by 0*2 ● bits 0-7: 0x00 = 0x00 ● So machine code is 1110 00 1 1101 0 0000 0001 0000 00000000 = 0xE3A01000

  37. Put 10 into R2 ● Add R1 and R2, put result into R1 ● Same basic machine code format as Move . ● ● cond = 1110 means unconditional ● S=0 means don't affect status flags ● cond = 1110 for “always” ; I=0 (not constant) ● I=1 means constant; opcode = 1101 for Move ● opcode = 0100 for ADD; S=0 (no flag update) ● Rn = ???? say 0000; Rd = 0010 for R2 ● Rn = R1, Rd = R1 ● bits 8-11: 0000 (rotate right by 2*0 ) bits 0-7: 0x0A ● shifter_operand = 0x002 for R2 unmolested ● So machine code is ● Having fun yet?? 1110 00 1 1101 0 0000 0010 0000 00001010 = 0xE3A0200A ● 1110 00 0 0100 0 0001 0001 0000 0000 0010 = 0xE0811002

  38. ● Subtract 1 from R2, result into R2 Maybe Rinse and Repeat ● Same basic machine code format as Move ● If the Z flag is not set, we want go back 2 instructions before this one. ● book Fig 3.2 ● cond = 1110 for “always” ; I=1 (constant) ● cond = 0001 means “when Z flag is not set” ● opcode = 0010 for Subtract; S=1 (yes flag update) ● L=0 means “don't Link” (Link changes R14) ● Rn = R2, Rd = R2 ● signed offset should be -4. The PC is already 2 instructions ahead ● shifter_operand = 0x001 for 1 rotated right 0 positions of this one, and we want to go back 2 more than that. ● 0001 101 0 111111111111111111111100 = 0x1AFFFFFC ● 1110 00 1 0010 1 0010 0010 0000 0000 0001 = 0xE2522001 ● Are you REALLY having fun yet ??

  39. How'd you know the cond codes? How'd You Know the Shifter Magic?

  40. An Assembler ● Rather than making you assemble together all the various bit fields that make up a machine instruction, let's make a program do that. ● You are responsible for breaking the problem down into individual instructions, which will be given human friendly names (mnemonics). ● You give these instruction names to the assembler, along with various other directives (aka pseudo-ops) that control how the assembler does its job. ● It is responsible for producing the binary machine code. ● It also produces symbol table information needed by a subsequent linker program, if you write a multi-module program.

  41. Assembly Language The Bad News ● You communicate with the assembler via assembly ● Anyone who creates an assembler gets to define language (mix of mnemonics, directives, etc.) their own assembly language (ignoring ● Assembly language is line-oriented. manufacturer's suggestions). Dialects? ● A line consists of ● Textbook shows code for Keil and Code – an optional label in column 1 Composer Studio. But we use Crossware's – an optional instruction or directive (and any arguments) assembler, which is yet another dialect and it's – an optional comment (after a ; ) hard to find documentation on it. ● Example: ● Textbook talks about “Old ARM format” and “UAL here b here ; create infinite loop. format”. Crossware is a mixture (more old). ● “here” is a label that marks a place ● b is a branch instruction, forces the PC to a new location (here).

  42. Our Program in Assembly Register Names ● r0 to r15 (alias R0 to R15) mymain mov r1,#0 ← mymain is the label mov is the instruction ● SP or sp, aliases for R13 # precedes the constant ● LR or lr, aliases for R14 ; nice comment, eh? mov r2,#10 ; put 10 into r2 (bad comment) ● PC or pc, aliases for R15 myloop add r1, R1, r2 ← case insensitive for reg names ● cpsr or CPSR (the status registers etc) subs r2, r2, #1 ← final s means to affect flags ● spsr or SPSR, apsr or APSR (later) bne myloop ← condition is “ne” (z flag false) sticky b sticky ← so we don't fall out of pgm ● not s0-s3 or a1-a4 (unlike book page 63) end ← directive to assembler: you're done ;don't use “end”; it seems to be buggy in Crossware

  43. Popular Assembler Directives Directive to Set Aside Memory ● Textbook Section 4.4 describes the set of directives ● The SPACE directive tells the assembler to set aside a specified number of bytes of memory. These locations will supported by the Keil assembler and the TI be initialized to 0. assembler. ● Usually have a label, since you need a name to refer to the ● Our Crossware assembler is different than both (but allocated memory. closer to Keil). ● Example ● Let's look at directives to – myarray SPACE 100 – set aside memory space for variables/arrays – myarr2 SPACE 100*4 ←constant expression's ok – define a block of code or data – give a symbolic name to a value ● Later, instructions can load and store things into the chunks of memory by referring to the names used. ● If myarray starts at address 1234, myarr2 starts at 1234+100

  44. Use of SPACE Directives for Memory Variables ● An assembly language programmer uses ● Use DCB to declare an initialized byte variable. SPACE for the same reasons that a Java ● DCW for initialized halfword, DCD for word. programmer uses an array. ● Example myvar1 DCB 50 ← decimal constant myvar2 DCB 'x' ← ASCII code of 'x' myvar3 DCB 0x55 + 3 ← constant expression ● If myvar1 ends up being at address 1234, then myvar2 will be at 1235 and myvar3 at 1236

  45. Alignment Alignment Example ● DCW assumes you want the memory variable v1 DCB 10 v1 DCB 10 to start at a multiple of 2 (“halfword aligned”) v2 DCW 20 v2 DCWU 20 v3 DCB 30 v3 DCB 30 ● DCD assumes you want alignment to a multiple v4 DCD 40 v4 DCDU 40 of 4. ● To achieve this, assembler will insert padding. If v1 is at address 3000, then If v1 is at 3000, then v2 starts at 3002 (1 byte of ● If you really want to set aside a word without v2 starts at 3001 padding) v3 is at 3003 padding, use DCDU. The “U” is for unaligned. v3 is at 3004 v4 starts at 3004 (aligned by luck) ● There's also DCWU. v4 starts at 3008 (3 bytes padding)

  46. More Alignment Control DCB with Several Values ● You can use DCB with several comma-separated values ● Several consecutive memory locations are set aside. A label ● Keil assembler has an ALIGN directive that can names the first of them. force alignment to the next word boundary ● Example: foo DCB 1,2,3,4 (inserting 0-3 bytes of padding). ● We can access the location initialized to 3 as “foo+2” ● A quoted string is equivalent to a comma separated list of ASCII values. ● In Crossware, the directive takes a numeric DCB “XY” is same as DCB 'X','Y' or DCB 88,89 argument. So ALIGN 4 (or ALIGN 8) ● DCW and DCD can also take a comma-separated list. ● Common use: make a small initialized table.

  47. DCB: Signed or Unsigned? AREA directive ● DCB's argument must be in the range -128 to +255. ● In general, an assembly language program can have several blocks of data and several blocks of code. And it can be written ● -ve values are 2's complement in several different source-code files. ● +ve values are treated as unsigned ● The AREA directive marks the beginning of a new block. You give it a new name and specify its type. ● So DCB -1, 255 is same as – eg AREA fred,code DCB 255, 255 – You can go back to a previous area by using an old name ● A tool called a linker runs after the assembler to put your ● Similarly DCW's arguments in range -32768 to various sections (and any library routines you need) into a +65535. single program. ● DCD from -2 31 to +2 32 -1 ● Much more on linkers later in the course

  48. AREA Example Code in Data, Data in Code ● Q: Is this allowed; if so, what does it do? AREA mycode,code foo add R1, R2, R3 AREA mycode, CODE add R4, R5, #10 starthere add R1, R2, R3 AREA mydata, data DCD 0x1234567 ; this line is fishy var1 dcb “cs2253” add R2, R3, R4 AREA mycode ← continues mycode where it left off AREA mydata, DATA add R6, R7, R8 var1 DCD 1234 This feature allows for us to show our data declarations near var2 add R2, R3, R4 ; this line is also fishy the code that uses them (maybe good software engineering), var3 DCB “hello world”,0 even if the different sections end up being far apart in memory. Memory picture on board...

  49. Operators in expressions EQU: Give a Symbolic Name ● The EQU directive is used to give a symbolic name to an add R4, R5, #10 ↔ add R4, R5, #3+3+3*1+1 expression. Use it to make code easier for humans. ● Both of the above generate the same single ● Example machine-code instruction. fred DCB 20, 200, “Frederick Wu” ● The + and * operators are just requests to the fred_age EQU fred+0 assembler to do a little bit of math when it processes the line. No runtime effect. fred_height EQU fred+1 ● Other operators supported by Crossware are | fred_name EQU fred+2 and & (bitwise AND and OR). Also >> and <<. ● I can't find XOR, mod (unlike Keil and CCS on Subsequent instructions can load data from fred_height rather than the more cryptic fred+1. page 75) But to the assembler, both loads will be equivalent.

  50. Directives Crossware May Lack A Few Instructions ● Compared to Keil and CCS, our Crossware assembler ● Assembler directives are great, but the main thing does not appear to support some directives. I can't find good in assembly language is to specify instructions (and documentation, so maybe they exist under a different name :( then get the assembler to generate the associated – ENTRY machine codes) – RN ● So far (from the loop example) we know – LTORG, though we do have the “LDR r x ,=” construct (eg – add textbook page 72) – SETS – sub ● Also, the SECTION directive only takes attributes – b CODE and DATA. Not the others in textbook Table 4.3. – mov ● Crossware does support macros and conditional assembly, advanced topics for later in the course.

  51. A Few More Instructions (Table 4.1) Mnemonics ● These are math-ish instructions: ● A mnemonic is “a memory aid”. ● It’s hard to remember the bit pattern associated – RSB – reverse subtract with a machine operation. – ADC, SBC – add/subtract with carry ● As a memory aid, we have human-friendly – RSC – reverse subtract with carry names like ADD, SUB etc. – MVN – move “negative” (a bitwise NOT) ● They are our mnemonics. – AND, ORR, EOR, BIC – bitwise logical operations – MUL, SMULL, UMULL – various * ops – MLA, SMLAL, UMLAL – multiply/accumulate.

  52. From Reference Example: Swapping ● Java swap of v1 and v2: temp = v1; v1 = v2; v2 = temp; ● Naive ARM swap of r1 and r2 mov r3, r1 mov r1, r2 mov r2, r3 ● Clever swap avoids trashing r3 (book p 53): eor r1, r1, r2 eor r2, r1, r2 eor r1, r1, r2 ● Book “Hacker's Delight” is full of this kind of trick.

  53. Example: 64-Bit Addition Computing Your Grade ● Assume r1 contains the high 32 bits of value X ● Test was out of 80. Prof told you how many and r2 contains the low 32 bits points you lost (put the number into R1). Figure out what your grade out of 80 was: ● Assume r3 contains the high 32 bits of Y and r4 contains the low 32 bits. RSB R2, R1, #80 ● Want result in r5 (high bits) and r6 (low bits) ● Now your grade is in R2. ADDS r6, r2, r4 ; add low words [affect flags] ADC r5, r1, r3 ; add high words

  54. Constant Operands Why This Weirdness ● Most instructions have register values or ● Studies show that most constants are small. constants as the operands ● (Exception: Load and store instructions – later) ● Among larger constants, bit-masks containing a small chunk of mixed bits are common (surrounded ● All 8-bit constants are okay by zeros) ● As are all constants of the form ● Similar bitmasks that are mostly 1s can be handled RotateRight( v, 2*amt) by using the MVN instruction where v is an 8-bit value and amt from 0 to 15. ● A RISC architecture with 32-bit instructions isn't ● So 0xAB is ok long enough to encode an arbitrary 32-bit constant. So just allow the most common ones. – so is 0xAB0 ( 0xAB with a 28 bit rotate right) ● Assembler complains if you use a constant that – so is 0xB000000A (0xAB with a 4-bit rotate right) cannot fit this weirdness.

  55. Machine Instruction With Constant The Barrel Shifter's Place

  56. Shifted Register Operands ARM Shifts and Rotates ● If the second operand is a register value, the barrel shifter can modify it as it travels down the B bus. ● Barrel shifter is capable of LSL (logical left shift) – LSR (logical shift right) – ASR (arithmetic shift right) – ROR (rotate right) – RXX (33 bit ROR using carry between MSB and LSB) ● No modification desired? Shift by 0 positions! ● Carry flag is involved (but the new carry value is not necessarily written into the status register)

  57. How Much Shifting Machine Encoding (from Ref Man) ● With RRX, it appears the register can only be shifted ● Below, shift field is 00 for LSL, 01 for LSR, 10 by one position. for ASR, 11 for ROR. RRX also 11 with count ● With others, you can shift 0 to 31 positions of 0 (and rotates only one position). – Either as a constant (“immediate”) – Or by the least significant 5 bits of a register ● There are separate machine code formats for these cases. – Bit 4 distinguishes the cases – Bits 5 & 6 say what kind of shift/rotate – Bits 11 to 7 involve which register, or the constant

  58. Example Setting Conditions ● Any of the data-processing instructions so far can ● Machine code to take R1, logical left shift it by 3 positions, result in R2 optionally affect the flags. ● Assembly language: MOV R2, R1, LSL #3 ● At the machine-code level, bit 20 (called S) controls this: ● It’s the “immediate shift” format: S=1 means to set the flags – Bits 27, 26, 25 and 4 are all 0 ● In assembly language, you append an S on the ● Bits 11 to 7 are 00011 (for the #3) mnemonic. ADDS instead of ADD ● Bits 3 to 0 are 0001 (since R1 is being shifted) ● Also, there are some instructions whose sole purpose is ● Bits 5 & 6 are 00 to select the LSL kind of shift to set flags: they don’t change any of R0 to R15. ● Unconditional, bits 31 to 28 are 1110; MOV opcode 1101 ● Compare (CMP, CMN) and Test (TST, TEQ) instructions. ● So: 1110 00 0 1101 0 ???? 0010 00011 00 0 0001 = 0xE1A02181

  59. Sum to a Limit Multiplication ● Let’s add 1+2+3+… until sum exceeds R4 (unsigned) ● The ARM v4 ISA has 6 multiplication instructions. MOV R1,#0 ; The sum ● Does not include “multiply by a constant” MOV R2,#1 LP ADD R1, R1, R2 ● Why several? ADD R2, R2, #1 – Should product be 32 bits or 64 bits? CMP R1, R4 ; computes R1 – R4, sets flags – Are the input values considered signed? BLS LP ; LS = unsigned Lower or Same (CF=0 or Z=1) ; use LE for signed Lesser or Equal

  60. 32-Bit Products 64-Bit Product (Long Multiply) ● Fact: Since the product stored is the low-order ● Results are stored in a pair of registers. ● The “accumulate” version has the product added onto the 64-bit 32 bits of the true product, signed and unsigned value in a pair of registers. variations would give same result. So not ● SMULL – signed long multiply separate instructions. ● UMULL – unsigned long multiply ● MUL instruction: Two registers' values multiplied, ● UMLAL - unsigned long multiply accumulate low-order 32 bits stored in destination register. ● SMLAL – signed long multiply accumulate ● MLA (multiply and accumulate). The low order ● Ex: UMLAL R1, R2, R3, R4 means 32-bits of the product are added to a 3 rd register (R1, R2) ← (R1, R2) + R3*R4 with unsigned math and stored in a 4 th register. – Above, R1 is the least significant 32 bits ● Eg: MLA R4, R1, R2, R3 ; R4 = R1*R2 + R3

  61. Overview ● Loads and Stores ● Memory Maps ● Register-Indirect Addressing ARM Memory ● Post- and Pre-indexed Addressing Owen Kaser, CS2253 Mostly corresponds to book Chapter 5.

  62. 16 Registers is Not Enough Loads and Stores ● So far, the only places discussed for data are the ● Recall that ARM is a “load/store” architecture. Cannot directly do calculations on values in memory. Have to load ARM's CPU registers them into a CPU register to use them as inputs. ● Most interesting programs need more data. ● Similarly, calculations put results into registers. Then you ● We need memory outside the CPU for our bulk can use a store instruction to put them into memory. ● Loads and stores need to specify where in memory things data storage. should go. This will be a numeric “memory address”. ● Also, memory can contain pre-computed tables ● (Memory) addressing modes are small built-in calculations (eg, of trig functions) that are never altered the CPU can do, to compute the memory address. ● For your toaster's software, the machine code ● Simple case: value in, say, R3 is to be used as the address. can be set at the factory. Fancy toaster: you can “flash” your toaster with improved software.

  63. Ex. Memory Map System Memory Maps (extracts from book Table 5.1) ● A system built around an ARM7TDMI processor uses 32-bit Start End Description 0x00000000 0x0003FFFF On-chip flash values as memory addresses. Each address would 0x00040000 0x00FFFFFF reserved correspond to a byte (oops, octet). 0x01000000 0x1FFFFFFF ROM ● The overall “memory address space” ranges from 0 to 0x20000000 0x20007FFF (Static) RAM 0xFFFFFFFF. ….. ● But the overall memory address space is further subdivided 0x4000C000 0x4000CFFF UART 0 (a “serial port”) device (boundaries are often small multiples of powers of 2) ….. ● RAM, ROM, flash, and I/O devices can be given their own 0xE0001000 0xE0001FFF “data watchpoint and trace” (DWT) facility …. subdivisions. 0xE0004000 0xFFFFFFFF reserved ● More on I/O devices later in the course. For now, just realize that some memory addresses accept stores, and some ignore them.

  64. For Simplicity.... Register-Indirect Addressing Mode ● Let's only mess with addresses in a range that ● Let's suppose you want to load the byte at address 0x00005000 into register R3. corresponds to RAM memory. ● 8 bit value into a 32-bit container. If we want the 8-bit ● Then, loads and stores both make sense. value to be zero-extended, use LDRB instruction. ● If you want it sign-extended, use LDRSB. ● Simplest case: a register stores the address of some data you care about. Let's go for R1. ● Assembler: MOV R1, #0x00005000 ;address to R1 LDRB R3, [R1] ; memory value to R3

  65. Looping Through Memory Speeding It Up ● Let's suppose you want to wipe clear (to 0) the ● If the area to be cleared is properly aligned contents of all memory locations from (starts on a multiple of 4) and is the right size (a 0x00005000 to 0x00005FFF. multiple of 4) we can clear out 4 consecutive addresses with one STR (store word) ● A loop will work nicely. instruction. MOV R1, #0x00005000 ; starting location ● Recall that a 32-bit word is stored across 4 MOV R2, #0x00006000; when to stop addresses: A, A+1, A+2, A+3. MOV R3, #0 LP STRB R3, [R1] ; wipe clear current location's value ADD R1, R1, #1 ; advance to next location TEQ R1, R2 ; has R1 hit the stopping location? BNE LP ….

  66. Faster Code Even Faster ● The pattern of “use a register to provide a memory MOV R1, #0x00005000 ; starting location address, then update the register in preparation for MOV R2, #0x00006000; when to stop the next loop” is extremely common. MOV R3, #0 ; 4 bytes of zeros LP STR R3, [R1] ; wipe clear current location's value AND the next 3 locations' values ● ARM designers created an addressing mode that ADD R1, R1, #4 ; advance to location of next group of 4 bytes does BOTH of these operations in a single TEQ R1, R2 ; has R1 hit the stopping location? instruction . “post-indexed” BNE LP ● STR R3, [R1], #4 is equivalent to STR R3, [R1] ● Loop runs only ¼ as many times now. ADD R1, R1, #4

  67. Textbook Figure 5.2 Even Faster Code MOV R1, #0x00005000 ; starting location MOV R2, #0x00006000; when to stop MOV R3, #0 ; 4 bytes of zeros LP STR R3, [R1], #4 ; wipe 4, then advance “pointer” R1 ADD R1, R1, #4 ; advance to location of next group TEQ R1, R2 ; has R1 hit the stopping location? BNE LP

  68. Java Pre- vs Post-Increment Post-Indexed Addressing ● Can draw a parallel to Java's ++ operators. ● In ARM, post-indexed indexing takes a base register. (Should not be R15.) ● Recall, v = M[ p++] in Java ● Uses that base register's value to go to memory – it uses the current version of p to index M ● Then updates the base register's value by a little – then it increments p. post-increment. computation ● Versus v = M[++p] in Java – adding/subtracting a constant (earlier example) – adding/subtracting a register – it first increments p pre-increment ● which is allowed to be modified by the barrel shifter – then then new value of p is used to index into M ● can be shifted/rotated by a constant amount ● can be shifted/rotated by a register amount ● Usefulness of fanciest of these seems doubtful ● LDR R1, [R2], ROR R3 ; is this useful???

  69. Useful? Example Pre-Indexed Addressing ● Java, for an int array M, variable x: ● There are two flavours of pre-indexed addressing. Both do a little computation and use the computed j = 0; effective address to go to memory. In one, the base while (….) { register is updated. Other flavour does not update. sum += M[j]; ● In assembly language, the ! symbol means to update the base register. Don't use R15 as the base register j += x;} with ! ● Ok to use R15, without ! The value of R15 is 8 bytes ● ARM: suppose x in R2, start of M in R1 beyond the start of the current machine code. [Details ● In loop body: LDR R3, [R1], R2 LSL #2 of why are a bit advanced.]

  70. Pre-indexed Figure (Textbook) Rationale for the “little computations” ● PC-relative addressing for constants ● Getting a field of an object, given the start of the object. ● Indexing into array of objects, selecting a field ● Instruction is STR r0, [r1, #12] (if the object size is a power of two) ● Add ! to update r1 when finished: ● (Selected largely by analyzing what compilers for HLLs would find useful, I think...rather than STR r0, [r1, #12]! ; r0 ← x20c focussing on assembly language programmers)

  71. Some Pre-indexed Examples Ex: Field Access for an Object ● MOV R1, 0x123456578 fails. Constant is not a rotation of ● In HLLs, the fields of an object occupy consecutive an 8 bit value. memory addresses (possibly with padding) ● Instead, initialize a memory location with your constant. ● Let's suppose that an object starts at 1000. There Then use PC-relative addressing to load it. are two 32-bit fields, then a 16-bit halfword field that LDR R1, myConst ; pseudo-op ● we want to load into R2. … 1000 bytes later... ● Let's suppose that R1 contains the starting address myConst DCD 0x12345678 of the object. ● The LDR instruction is actually something like ● Use LDRH R2, [R1, #8] ; immediate offset is 8 LDR R1, [PC, #996] ; PC was already 8 ahead ● 996 is close enough to PC. Must be within 4 kiB. (Desired field starts 8 bytes later: gotta skip over first two words.) ● (Minor point: LDRH requires offset ±256)

  72. Ex: Array Access No ADR Pseudo-op ● Suppose R1 contains the starting address of an ● The Crossware assembler does not seem to support ADR, which is used to put an address into a register (that you will then use as a base register). For array. instance, summing values in array… ● Suppose the array's elements are 4 bytes each MOV R0, #0 ; accumulate answer ● To load the w th array element, we want address ADR R1, MyArr ; Keil pseudo-op ADR R2, AfterMyArr ; past last valid address R1 + 4*w LP LDR R3, [R1], #4 ● Suppose value w is in R2 ADD R0, R0, R3 TEQ R1, R2 ● LDR R5, [R1, R2 , LSL #2] loads desired value. BNE LP ….. MyArr DCD 34, 23, 56, 78, 12345566, ……... AfterMyArr DCB 0

  73. Instead of ADR LDR As Pseudoinstruction ● Instead of ADR, you should be able to do the following: ● LDR R x , = value works for any 32-bit value (address or MOV R0, #0 ; accumulate answer constant ). LDR R1, =MyArr ● It sets aside space in a “constant pool” , preinitialized to LDR R2, =AfterMyArr ; past last valid address value. This constant pool is (by default) at the end of the current AREA. LP LDR R3, [R1], #4 ● Then it generates machine code for a PC-relative LDR ADD R0, R0, R3 into Rx from this preinitialized location. TEQ R1, R2 ● Like a convenient DCD and LDR R x , [PC, # something ] BNE LP ….. ● See textbook Chapter 6. MyArr DCD 34, 23, 56, 78, 12345566, ……... AfterMyArr DCD 0 ; wasted word, could avoid...

  74. Machine-Code Formats Meaning of Some Bits (Ref Man) LDR/STR/LDRB/STRB ● From reference manual:

  75. Exercise/Example Load and Store Multiple ● Determine machine code for ● There are instructions LDM and STM that load or store a number of registers. LDR R3, [R1], #4 ● With LDM, a bit vector in the machine code and also indicates which register to load. They are STRB R3, [R1, R2, LSR #5]! loaded from consecutive addresses. ● STM works similarly ● They are especially useful in storing things on the runtime stack, and will be looked at when we cover that topic.

  76. Control Structures ● Implementing familiar HLL control structures: – if-then Control Structures – if-then-else CS2253, Owen Kaser – while – do..while ● Omit: switch ● See textbook Chapter 8

  77. Basic Mechanism Nesting ● Essentially, to disrupt the flow of control you need to ● A typical HLL program has nested control structures: if inside of an if , inside of while ... set PC (alias R15) to a new value. ● We'll look at how to replace a HLL control structure (that ● The b command does this might have another control structure within it) by ● But so does any other allowable instruction that writes corresponding assembly language. to R15! ● The inner control structure can be replaced similarly. ● Consider this instruction: ● In the following templates, the first use of newlabel1 … add R15, R15, R3 shl 2 newlabel9 means to generate and use a label that was not already in use. Any subsequent occurrence of, say, Number of instructions skipped ahead depends on R3. newlabel1 means to use that same label.

  78. If Without Else Example ● Replace if (<condition>) { <body> } by ● a1 is in R1, a2 is in R2 ● Translate if (a1 >= a2) { a1++;} code to test the condition (often using CMP) b<opposite of condition> newlabel1 cmp R1, R2 code for body blt xyz0001 ; lt is opposite to >= newlabel1 add R1, R1, #1 ; translation of a1++ xyz0001 ; my new label

  79. ARM Optimization If With Else ● If the body doesn't have nested control statements or Replace if (<condition>) { <body1>} else {<body2>} with other statements that set the flags, can have the following code to test condition code to test the condition b<opposite of condition> newlabel1 code for body1 code for the body, with every instruction conditional. b newlabel2 Eg newlabel1 cmp R1, R2 code for body2 addge R1, R1, #1 ; add made conditional on >= newlabel2

  80. Example ARM Optimization ● if (a1 >= a2) a1++; else a2++; ● Since the bodies are simple, can use predicated [i.e., conditional] instructions: ● Following the template: cmp R1, R2 cmp R1, R2 ; a1 >= a2 ?? addge R1, R1, #1 ; the “then” body blt xyz001 addlt R2, R2, #1 ; the “else” body add R1, R1, #1 b xyz002 ; don't fall into else code ● Look Ma, no labels and no branching. No xyz001 “branch penalty”. add R2, R2, #1 ; the else's body xyz002

  81. While Statement Example ● Recall that a while statement checks the condition before every ● for (i=0; i<j; i+=2) ++k; ← for is just while disguised. iteration, including the first. ● while (<cond>) {<body>} can turn into mov R1, #0 ; say R1 stores I b xyz001 b newlabel1 xyz002 newlabel2 add R3, R3, #1 ; body: say R3 has k code for <body> add R1, R1, #2 ; code for i+=2 newlabel1 xyz001 code for <cond> b<the condition> newlabel2 cmp R1, R2 ; say R2 has j blt xyz002 Other translations are possible, but this is the book's

  82. Counting Down To Zero Do...While Statement ● If you can arrange for your for loops to count Translate do { <body> } while (<cond>); as down from N to zero AND if it is guaranteed to do at least one iteration, better to use code like newlabel1 mov R1, #N ; counting down with R1 code for <body> newlabel1 code to check condition code for the body of the loop b<cond> newlabel1 subs R1, R1, #1 ; set the flags ● Slightly simpler than the while loop bne newlabel1

  83. Nesting Conditional Execution ● Using conditional execution, we can reduce ● Let's do Euclid's algo together: Euclid's code to GCD CMP R0, R1 while (a != b) SUBGT R0, R0, R1 if (a>b) a=a-b; SUBLT R1, R1, R0 else b=b-a; BNE GCD ● Book also shows how to use conditional execution to handle something like if (x==1 || x==5) ++x

  84. Contents Assemblers and Linkers ● Review of assembler tasks ● A look at linker tasks CS 2253 ● Assembler implementation Owen Kaser, UNBSJ ● The location counter and symbol table ● Two-pass assembler ● Macros and conditional compilation

  85. Review of Assemblers Linkers ● An assembler takes commands and translates ● The assembler typically generates one “object them into what will be the contents of some code” (.OBJ) file, containing the contents of the areas. various areas. ● Assembler commands can be ● One source code file → one object code file. – directives, such as ● Libraries are also object code files. ● AREA foo, data [ change the area being generated] ● Linker's overall job is to put together the various ● DCB “hello” [ generate some byte contents in current area] areas in all the object files, getting an executable – instructions file that is ready to load into memory and run. ● ADD R1,R2,R3 [generate machine code bits in current area] – labels ● blah …. [ record the current position in the current area as “blah”]

Recommend


More recommend