lecture 6 endianness and characters
play

Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte - PowerPoint PPT Presentation

CS 230 Introduction to Computers and Computer Systems Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte Convention: 8 bits = 1 byte 8 is a power of 2 twos complement range -128 ... 127 unsigned binary


  1. CS 230 – Introduction to Computers and Computer Systems Lecture 6 – Endianness and Characters CS 230 - Spring 2020 1-1

  2. Byte  Convention: 8 bits = 1 byte  8 is a power of 2  two’s complement range -128 ... 127  unsigned binary range 0 … 255  two hexadecimal digits  historical  useful range to represent characters, control  8-bit circuit width CS 230 - Spring 2020 1-2

  3. Word  Increasing circuit width: 32 or 64 bits  Individual bytes are still accessible  what order are they in?  Gulliver’s Travels – Jonathan Swift, 1726  Little-endian: least-significant byte first  same number in memory, regardless of length  can start math right away  Big-endian: most-significant byte first  “natural” way of writing numbers CS 230 - Spring 2020 1-3

  4. Byte Order  Especially relevant for distributed systems  Different computers, different endianness?  In principle, similar challenge for bits  not relevant, since bits are usually not addressable  we’ll talk about being “addressable” later  in CS 230, bits are written in big-endian  the same order we’ve been doing so far CS 230 - Spring 2020 1-4

  5. Endianness Example  Consider the big-endian 32-bit word 0x01FAB352  In what order do we send the bits to a little-endian computer? CS 230 - Spring 2020 1-5

  6. Endianness Example  Consider the big-endian 32-bit word 0x01FAB352  In what order do we send the bits to a little-endian computer?  Break it up into bytes: 32 / 8 = 4 bytes  0x01 0xFA 0xB3 0x52 CS 230 - Spring 2020 1-6

  7. Endianness Example  Consider the big-endian 32-bit word 0x01FAB352  In what order do we send the bits to a little-endian computer?  Break it up into bytes: 32 / 8 = 4 bytes  0x01 0xFA 0xB3 0x52  Swap them to little-endian  0x52 0xB3 0xFA 0x01 CS 230 - Spring 2020 1-7

  8. Endianness Example  Consider the big-endian 32-bit word 0x01FAB352  In what order do we send the bits to a little-endian computer?  Break it up into bytes: 32 / 8 = 4 bytes  0x01 0xFA 0xB3 0x52  Swap them to little-endian  0x52 0xB3 0xFA 0x01  Convert them to binary  First - 01010010 10110011 11111010 00000001 - Last CS 230 - Spring 2020 1-8

  9. Endianness Try it Yourself  Consider the big-endian 32-bit word 423 10  In what order do we send the bits to a little-endian computer? CS 230 - Spring 2020 1-9

  10. Endianness Try it Yourself  Consider the big-endian 32-bit word 423 10  In what order do we send the bits to a little-endian computer?  Convert it to binary (or you could do hexadecimal here)  00000000000000000000000110100111 CS 230 - Spring 2020 1-10

  11. Endianness Try it Yourself  Consider the big-endian 32-bit word 423 10  In what order do we send the bits to a little-endian computer?  Convert it to binary (or you could do hexadecimal here)  00000000000000000000000110100111  Break it up into bytes: 32 / 8 = 4 bytes  00000000 00000000 00000001 10100111 CS 230 - Spring 2020 1-11

  12. Endianness Try it Yourself  Consider the big-endian 32-bit word 423 10  In what order do we send the bits to a little-endian computer?  Convert it to binary (or you could do hexadecimal here)  00000000000000000000000110100111  Break it up into bytes: 32 / 8 = 4 bytes  00000000 00000000 00000001 10100111  Swap them to little-endian (and convert to binary if in hex)  First - 10100111 00000001 00000000 00000000 - Last CS 230 - Spring 2020 1-12

  13. Characters  What about representing text with bits?  characters: a, b, 8, *, \, Q, etc.  Assign each character a number  but who decides which character gets which number?  some languages have many characters CS 230 - Spring 2020 1-13

  14. ASCII Characters  ASCII – American Standard Code for Information Interchange 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-14

  15. ASCII Example  Example: interpret 0x4672656506 as ASCII 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-15

  16. ASCII Example  Example: interpret 0x4672656506 as ASCII  Answer: Free[ACK] 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-16

  17. ASCII Try it Yourself  Try it yourself: interpret 0x0077696E as ASCII 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-17

  18. ASCII Try it Yourself  Try it yourself: interpret 0x0077696E as ASCII  Answer: [NUL]win 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US “ 20 ! # $ % & ' ( ) * + , - . / 30 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n o 70 p q r s t u v w x y z { | } ~ DEL CS 230 - Spring 2020 1-18

  19. Unicode  Unicode provides over 100,000 code points  for characters, symbols, etc.  code point range: U+0000 .... U+10FFFF  range: 2 16 + 2 20 ~ 1 million possible code points  UTF: Unicode Transformation Format  UTF-32  direct 4-byte encoding of code points CS 230 - Spring 2020 1-19

  20. Variable Length Encoding  UTF-16  2-byte encoding for most codes  sometimes special prefix indicates 4-byte code  UTF-8  similar principle: variable length encoding  1-4 bytes  1-byte encoding compatible to ASCII CS 230 - Spring 2020 1-20

  21. Data Representation  Interpretation is in the eye of the beholder  What does this represent? 01110111011010000111100100111111 CS 230 - Spring 2020 1-21

  22. Data Representation 01110111011010000111100100111111 01110111 01101000 01111001 00111111 7 7 6 8 7 9 3 F 77 68 79 3F w h y ?  Or: 2,003,335,487 10 CS 230 - Spring 2020 1-22

  23. Big Integers  Word size currently 32 or 64 bits  Programming libraries offer big integer types  Complex data structures – more costly  operations in software, rather than hardware CS 230 - Spring 2020 1-23

  24. Data Interpretation  Bits have no inherent meaning  interpretation is in the eye of the beholder  must start from implicit agreement  ASCII, UTF, Floating Point, Two’s Complement, etc. CS 230 - Spring 2020 1-24

Recommend


More recommend