overview questions
play

Overview/Questions How is text represented within computer? How - PDF document

CS108 Lecture 08: Computing with Text String module operations Character encoding/decoding Aaron Stevens 4 February 2009 1 Overview/Questions How is text represented within computer? How can we manipulate text in our programs? 2 1


  1. CS108 Lecture 08: Computing with Text String module operations Character encoding/decoding Aaron Stevens 4 February 2009 1 Overview/Questions – How is text represented within computer? – How can we manipulate text in our programs? 2 1

  2. Review of String Operations Operator/Operation Meaning Concatenation + Repetition * Indexing <string>[<expression>] <string>[<begin>:<end>] Slicing Length len(<string>) Iteration for <var> in <string> 3 Strings, Lists, and Sequences The operations in the previous table are not really just string operations. They apply to any <sequence> , which includes any list. Show some list examples using a list of numbers… (as time permits) 4 2

  3. String Module Operations Python provides a built-in module of useful string manipulation functions. Examples: >>> import string >>> text = " the cat in the hat " >>> string.capitalize(text) >>> string.capwords(text) >>> string.upper(text) >>> string.center(text,40) Note: this string library is mostly deprecated... We’ll discuss an alternative next week. 5 String Module Operations More examples: >>> text = "I love watching birds fly" >>> string.replace(text, "birds","fish") >>> text = "to be or not to be" >>> string.count(text,"o") >>> string.find(text,"be") >>> string.split(text) 6 3

  4. String Module Operations Function Meaning Capitalize entire text or first letter of each word. capitalize(<str>), capwords(<str>) center(<str>, <width>) Center string in <width> spaces. Count occurrences of <sub> in <str>. count(<str>,<sub>) Find index of the first occurrence of <sub> in find(<str>,<sub>) <str>. Concatenate a list of strings into a single string. join(<list>) Returns a copy of <str> in lowercase. lower(<str) replace(<str>,<old>,<new>) Return a copy of <str>replacing all occurrences of <old> with <new>. Remove leading/trailing white space. strip(<str>) split(<str>, [<delim]) Split <str> into a list of words, using <delim> as delimeter. Returns a copy of <str> in uppercase. upper(<str>) Refer to table 4.2 on page 96 of Zelle for a summary of the Python string module. 7 http://docs.python.org/lib/node42.html shows the complete list. String Module Example Replacing all occurrences of a word: Also: show split word into list of words. 8 4

  5. Character Encoding Encoding Computers store text data by representing each character/symbol as a number, and storing that number in binary. American Standard for Computer Information Interchange ASCII - the most common encoding scheme. Each symbol assigned a unique number. 9 The ASCII Character Set ASCII stands for American Standard Code for Information Interchange ASCII originally used seven bits to represent each character, allowing for 128 unique characters Later extended ASCII evolved so that all eight bits were used. 10 5

  6. The ASCII Character Set (7 bits) 11 The Extended ASCII Character Set 12 6

  7. Character Encoding Python can convert characters to ASCII using the built-in ord(<character>) . Example: >>> ord("A") >>> >>> text = "The Cat in the Hat" >>> for ch in text: ... print ord(ch), ... print # blank line 13 Character Encoding Example: Collect some text from the user Print out sequence of ASCII character codes text = raw_input( " Enter your text: " ) for ch in text: print ord(ch), " , " , print # blank line 14 7

  8. Character Decoding Python also has a built-in chr(<num>) which converts a number to its corresponding ASCII character. Example: >>> num = 81 >>> print chr(num ) 15 Character Decoding What does the following message say? 71, 105, 97, 110, 116, 115, 32, 98, 121, 32, 84, 104, 114, 101, 101, 32, 80, 111, 105, 110, 116, 115 How would you figure it out? 16 8

  9. Character Decoding Example: decoding an ASCII message – Treat the numbers as a list – For each number on the list:  Treat numeric symbols as number  Convert number to ASCII character >>> numbers = input("Enter a sequence of ASCII numbers: ") >>> for n in numbers: ... print chr(n), 17 Take-Away Points – String module functions – Character set, ASCII encoding – Character-encoding/decoding 18 9

  10. Student To Dos – HW 03: definite loop, due Tuesday 2/3 – Readings this week:  Zelle 4.1-4.3 (today, Wednesday)  Zelle 4.4-4.5 (Friday) 19 10

More recommend