CS 230 – Introduction to Computers and Computer Systems Lecture 6 – Endianness and Characters CS 230 - Spring 2020 1-1
Byte Convention: 8 bits = 1 byte 8 is a power of 2 two’s complement range -128 ... 127 unsigned binary range 0 … 255 two hexadecimal digits historical useful range to represent characters, control 8-bit circuit width CS 230 - Spring 2020 1-2
Word Increasing circuit width: 32 or 64 bits Individual bytes are still accessible what order are they in? Gulliver’s Travels – Jonathan Swift, 1726 Little-endian: least-significant byte first same number in memory, regardless of length can start math right away Big-endian: most-significant byte first “natural” way of writing numbers CS 230 - Spring 2020 1-3
Byte Order Especially relevant for distributed systems Different computers, different endianness? In principle, similar challenge for bits not relevant, since bits are usually not addressable we’ll talk about being “addressable” later in CS 230, bits are written in big-endian the same order we’ve been doing so far CS 230 - Spring 2020 1-4
Endianness Example Consider the big-endian 32-bit word 0x01FAB352 In what order do we send the bits to a little-endian computer? CS 230 - Spring 2020 1-5
Endianness Example Consider the big-endian 32-bit word 0x01FAB352 In what order do we send the bits to a little-endian computer? Break it up into bytes: 32 / 8 = 4 bytes 0x01 0xFA 0xB3 0x52 CS 230 - Spring 2020 1-6
Endianness Example Consider the big-endian 32-bit word 0x01FAB352 In what order do we send the bits to a little-endian computer? Break it up into bytes: 32 / 8 = 4 bytes 0x01 0xFA 0xB3 0x52 Swap them to little-endian 0x52 0xB3 0xFA 0x01 CS 230 - Spring 2020 1-7
Endianness Example Consider the big-endian 32-bit word 0x01FAB352 In what order do we send the bits to a little-endian computer? Break it up into bytes: 32 / 8 = 4 bytes 0x01 0xFA 0xB3 0x52 Swap them to little-endian 0x52 0xB3 0xFA 0x01 Convert them to binary First - 01010010 10110011 11111010 00000001 - Last CS 230 - Spring 2020 1-8
Endianness Try it Yourself Consider the big-endian 32-bit word 423 10 In what order do we send the bits to a little-endian computer? CS 230 - Spring 2020 1-9
Endianness Try it Yourself Consider the big-endian 32-bit word 423 10 In what order do we send the bits to a little-endian computer? Convert it to binary (or you could do hexadecimal here) 00000000000000000000000110100111 CS 230 - Spring 2020 1-10
Endianness Try it Yourself Consider the big-endian 32-bit word 423 10 In what order do we send the bits to a little-endian computer? Convert it to binary (or you could do hexadecimal here) 00000000000000000000000110100111 Break it up into bytes: 32 / 8 = 4 bytes 00000000 00000000 00000001 10100111 CS 230 - Spring 2020 1-11
Endianness Try it Yourself Consider the big-endian 32-bit word 423 10 In what order do we send the bits to a little-endian computer? Convert it to binary (or you could do hexadecimal here) 00000000000000000000000110100111 Break it up into bytes: 32 / 8 = 4 bytes 00000000 00000000 00000001 10100111 Swap them to little-endian (and convert to binary if in hex) First - 10100111 00000001 00000000 00000000 - Last CS 230 - Spring 2020 1-12
Characters What about representing text with bits? characters: a, b, 8, *, \, Q, etc. Assign each character a number but who decides which character gets which number? some languages have many characters CS 230 - Spring 2020 1-13
ASCII Characters ASCII – American Standard Code for Information Interchange 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-14
ASCII Example Example: interpret 0x4672656506 as ASCII 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-15
ASCII Example Example: interpret 0x4672656506 as ASCII Answer: Free[ACK] 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-16
ASCII Try it Yourself Try it yourself: interpret 0x0077696E as ASCII 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-17
ASCII Try it Yourself Try it yourself: interpret 0x0077696E as ASCII Answer: [NUL]win 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-18
Unicode Unicode provides over 100,000 code points for characters, symbols, etc. code point range: U+0000 .... U+10FFFF range: 2 16 + 2 20 ~ 1 million possible code points UTF: Unicode Transformation Format UTF-32 direct 4-byte encoding of code points CS 230 - Spring 2020 1-19
Variable Length Encoding UTF-16 2-byte encoding for most codes sometimes special prefix indicates 4-byte code UTF-8 similar principle: variable length encoding 1-4 bytes 1-byte encoding compatible to ASCII CS 230 - Spring 2020 1-20
Data Representation Interpretation is in the eye of the beholder What does this represent? 01110111011010000111100100111111 CS 230 - Spring 2020 1-21
Data Representation 01110111011010000111100100111111 01110111 01101000 01111001 00111111 7 7 6 8 7 9 3 F 77 68 79 3F w h y ? Or: 2,003,335,487 10 CS 230 - Spring 2020 1-22
Big Integers Word size currently 32 or 64 bits Programming libraries offer big integer types Complex data structures – more costly operations in software, rather than hardware CS 230 - Spring 2020 1-23
Data Interpretation Bits have no inherent meaning interpretation is in the eye of the beholder must start from implicit agreement ASCII, UTF, Floating Point, Two’s Complement, etc. CS 230 - Spring 2020 1-24
Recommend
More recommend