15-213 “The Class That Gives CMU Its Zip!” Bits and Bytes Aug. 29, 2002 Topics Topics n Why bits? n Representing information as bits l Binary/Hexadecimal l Byte representations » numbers » characters and strings » Instructions n Bit-level manipulations l Boolean algebra l Expressing in C class02.ppt 15-213 F’02
Why Don’t Computers Use Base 10? Why Don’t Computers Use Base 10? Base 10 Number Representation Base 10 Number Representation n That’s why fingers are known as “digits” n Natural representation for financial transactions l Floating point number cannot exactly represent $1.20 n Even carries through in scientific notation l 1.5213 X 10 4 Implementing Electronically Implementing Electronically n Hard to store l ENIAC (First electronic computer) used 10 vacuum tubes / digit n Hard to transmit l Need high precision to encode 10 signal levels on single wire n Messy to implement digital logic functions l Addition, multiplication, etc. 15-213, F’02 – 2 –
Binary Representations Binary Representations Base 2 Number Representation Base 2 Number Representation n Represent 15213 10 as 11101101101101 2 n Represent 1.20 10 as 1.0011001100110011[0011]… 2 n Represent 1.5213 X 10 4 as 1.1101101101101 2 X 2 13 Electronic Implementation Electronic Implementation n Easy to store with bistable elements n Reliably transmitted on noisy and inaccurate wires 0 1 0 3.3V 2.8V 0.5V 0.0V 15-213, F’02 – 3 –
Byte-Oriented Memory Organization Byte-Oriented Memory Organization Programs Refer to Virtual Addresses Programs Refer to Virtual Addresses n Conceptually very large array of bytes n Actually implemented with hierarchy of different memory types l SRAM, DRAM, disk l Only allocate for regions actually used by program n In Unix and Windows NT, address space private to particular “process” l Program being executed l Program can clobber its own data, but not that of others Compiler + Run-Time System Control Allocation Compiler + Run-Time System Control Allocation n Where different program objects should be stored n Multiple mechanisms: static, stack, and heap n In any case, all allocation within single virtual address space 15-213, F’02 – 4 –
Encoding Byte Values Encoding Byte Values Decimal Binary Byte = 8 bits Byte = 8 bits Hex n Binary 00000000 2 to 11111111 2 0 0 0000 n Decimal: 0 10 to 255 10 1 1 0001 2 2 0010 n Hexadecimal 00 16 to FF 16 3 3 0011 l Base 16 number representation 4 4 0100 5 5 0101 l Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ 6 6 0110 l Write FA1D37B 16 in C as 0xFA1D37B 7 7 0111 » Or 0xfa1d37b 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111 15-213, F’02 – 5 –
Machine Words Machine Words Machine Has “Word Size” Machine Has “Word Size” n Nominal size of integer-valued data l Including addresses n Most current machines are 32 bits (4 bytes) l Limits addresses to 4GB l Becoming too small for memory-intensive applications n High-end systems are 64 bits (8 bytes) l Potentially address ≈ 1.8 X 10 19 bytes n Machines support multiple data formats l Fractions or multiples of word size l Always integral number of bytes 15-213, F’02 – 6 –
Word-Oriented Memory Word-Oriented Memory Organization Organization 64-bit 32-bit Bytes Addr. Words Words 0000 Addr 0001 Addresses Specify Byte Addresses Specify Byte = 0002 0000 ?? Locations Locations Addr 0003 = n Address of first byte in 0004 0000 ?? Addr word 0005 = 0006 n Addresses of successive 0004 ?? 0007 words differ by 4 (32-bit) or 0008 8 (64-bit) Addr 0009 = 0010 0008 ?? Addr 0011 = 0008 ?? 0012 Addr 0013 = 0014 0012 ?? 0015 15-213, F’02 – 7 –
Data Representations Data Representations Sizes of C Objects (in Bytes) Sizes of C Objects (in Bytes) n C Data Type Compaq Alpha Typical 32-bit Intel IA32 l int 4 4 4 l long int 8 4 4 l char 1 1 1 l short 2 2 2 l float 4 4 4 l double 8 8 8 l long double 8 8 10/12 l char * 8 4 4 » Or any other pointer 15-213, F’02 – 8 –
Byte Ordering Byte Ordering How should bytes within multi-byte word be ordered in How should bytes within multi-byte word be ordered in memory? memory? Conventions Conventions n Sun’s, Mac’s are “Big Endian” machines l Least significant byte has highest address n Alphas, PC’s are “Little Endian” machines l Least significant byte has lowest address 15-213, F’02 – 9 –
Byte Ordering Example Byte Ordering Example Big Endian Endian Big n Least significant byte has highest address Little Endian Endian Little n Least significant byte has lowest address Example Example n Variable x has 4-byte representation 0x01234567 n Address given by &x is 0x100 Big Endian 0x100 0x101 0x102 0x103 01 23 45 67 01 23 45 67 Little Endian 0x100 0x101 0x102 0x103 67 45 23 01 67 45 23 01 15-213, F’02 – 10 –
Reading Byte-Reversed Listings Reading Byte-Reversed Listings Disassembly Disassembly n Text representation of binary machine code n Generated by program that reads the machine code Example Fragment Example Fragment Address Instruction Code Assembly Rendition 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx) Deciphering Numbers Deciphering Numbers 0x12ab n Value: 0x000012ab n Pad to 4 bytes: 00 00 12 ab n Split into bytes: ab 12 00 00 n Reverse: 15-213, F’02 – 11 –
Examining Data Representations Examining Data Representations Code to Print Byte Representation of Data Code to Print Byte Representation of Data n Casting pointer to unsigned char * creates byte array typedef unsigned char *pointer; void show_bytes(pointer start, int len) { int i; for (i = 0; i < len; i++) printf("0x%p\t0x%.2x\n", start+i, start[i]); printf("\n"); } Printf directives: %p : Print pointer %x : Print Hexadecimal 15-213, F’02 – 12 –
show_bytes Execution Example show_bytes Execution Example int a = 15213; printf("int a = 15213;\n"); show_bytes((pointer) &a, sizeof(int)); Result (Linux): int a = 15213; 0x11ffffcb8 0x6d 0x11ffffcb9 0x3b 0x11ffffcba 0x00 0x11ffffcbb 0x00 15-213, F’02 – 13 –
Representing Integers Representing Integers Decimal: 15213 int A = 15213; A = 15213; int Binary: 0011 1011 0110 1101 int B = -15213; B = -15213; int long int int C = 15213; C = 15213; long Hex: 3 B 6 D Linux/Alpha A Sun A Linux C Alpha C Sun C 6D 00 6D 6D 00 3B 00 3B 3B 00 00 3B 00 00 3B 00 6D 00 00 6D 00 00 Linux/Alpha B Sun B 00 93 FF 00 C4 FF FF C4 FF 93 Two’s complement representation (Covered next lecture) 15-213, F’02 – 14 –
Alpha P Representing Pointers Representing Pointers A0 FC int B = -15213; B = -15213; int FF int *P = &B; *P = &B; int FF Alpha Address 01 00 Hex: 1 F F F F F C A 0 00 Binary: 0001 1111 1111 1111 1111 1111 1100 1010 0000 00 Sun P Sun Address EF Hex: E F F F F B 2 C FF Binary: 1110 1111 1111 1111 1111 1011 0010 1100 Linux P FB 2C D4 Linux Address F8 Hex: B F F F F 8 D 4 FF Binary: 1011 1111 1111 1111 1111 1000 1101 0100 BF Different compilers & machines assign different locations to objects 15-213, F’02 – 15 –
Representing Floats Representing Floats Float F = 15213.0; Float F = 15213.0; Linux/Alpha F Sun F 00 46 B4 6D 6D B4 46 00 IEEE Single Precision Floating Point Representation IEEE Single Precision Floating Point Representation IEEE Single Precision Floating Point Representation Hex: Hex: Hex: 4 6 6 D B 4 0 0 4 6 6 D B 4 0 0 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 15213: 1110 1101 1011 01 15213: 1110 1101 1011 01 15213: 1110 1101 1011 01 Not same as integer representation, but consistent across machines Can see some relation to integer representation, but not obvious 15-213, F’02 – 16 –
Representing Strings Representing Strings char S[6] = "15213"; char S[6] = "15213"; Strings in C Strings in C n Represented by array of characters Linux/Alpha S Sun S n Each character encoded in ASCII format 31 31 l Standard 7-bit encoding of character set 35 35 l Other encodings exist, but uncommon 32 32 l Character “0” has code 0x30 31 31 » Digit i has code 0x30 + i 33 33 n String should be null-terminated 00 00 l Final character = 0 Compatibility Compatibility n Byte ordering not an issue l Data are single byte quantities n Text files generally platform independent l Except for different conventions of line termination character(s)! 15-213, F’02 – 17 –
Machine-Level Code Representation Machine-Level Code Representation Encode Program as Sequence of Instructions Encode Program as Sequence of Instructions n Each simple operation l Arithmetic operation l Read or write memory l Conditional branch n Instructions encoded as bytes l Alpha’s, Sun’s, Mac’s use 4 byte instructions » Reduced Instruction Set Computer (RISC) l PC’s use variable length instructions » Complex Instruction Set Computer (CISC) n Different instruction types and encodings for different machines l Most code not binary compatible Programs are Byte Sequences Too! Programs are Byte Sequences Too! 15-213, F’02 – 18 –
Recommend
More recommend