Jan 30: How computers work; more histograms
Storage is organized in directories (folders) and files / C:\ /Users/mimno C:\Users\mimno /Users/mimno/Documents C:\Users\mimno\Documents
Storage is organized in directories (folders) and files / C:\ /Users/mimno C:\Users\mimno /Users/mimno/Documents C:\Users\mimno\Documents "Backslash" "Forward slash"
Applications
Paths identify files /Users/mimno/Documents/2950/plans.txt ~/Documents/2950/plans.txt 2950/plans.txt Absolute path starts with / or c:\
Paths identify files /Users/mimno/Documents/2950/plans.txt ~/Documents/2950/plans.txt 2950/plans.txt ~ means "current user's home directory"
Paths identify files /Users/mimno/Documents/2950/plans.txt ~/Documents/2950/plans.txt 2950/plans.txt Relative path implies a current working directory
Paths identify files /Users/mimno/Documents/2950/plans.txt ~/Documents/2950/plans.txt 2950/plans.txt File extensions give hints about how to interpret contents of files
Paths identify files /Users/mimno/Documents/2950/plans.txt ~/Documents/2950/plans.txt 2950/plans.pdf File extensions give hints about how to interpret contents of files, and which app opens them
Files contain bytes Text files: Bytes = characters Binary files: Bytes = ¯\_( ツ )_/¯ Used for documents, source code, data Used for formatted docs, compiled code, data, compressed files, images e.g. .txt, .rtf, .py, .csv, .xml e.g. .pdf, .exe, .npy, .zip, .gif Byte-character relationship defined by character encoding Byte-data relationship defined by application . e.g. UTF-8, Latin-1, ISO-8859-7 You cannot look at a binary file except through an appropriate application. You can look at a text file without knowing which application will read it (but it might look different)
A CSV file Size,Color 1,green 2,red 2,green 3,red
A CSV file Size,Color\n1,green\n2,red\n2,green\n3,red\n[EOF]
A CSV file Size,Color\n1,green\n2,red\n2,green\n3,red\n[EOF] Is \n forward- or back-slash?
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2,green ↩ 3,red ↩ [EOF]
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2,green ↩ 3,red ↩ [EOF] 83 105 122 101 44 67 111 108 111 114 10 49 44 103 114 101 101 110 10 50 44 114 101 100 10 50 44 103 114 101 101 110 10 51 44 114 101 100 10
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2,green ↩ 3,red ↩ [EOF] 83 105 122 101 44 67 111 108 111 114 10 49 44 103 114 101 101 110 10 50 44 114 101 100 10 50 44 103 114 101 101 110 10 51 44 114 101 100 10 ASCII maps numbers from 0-127 to common English characters, numbers, punctuation, and whitespace
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2,green ↩ 3,red ↩ [EOF] 83 105 122 101 44 67 111 108 111 114 10 49 44 103 114 101 101 110 10 50 44 114 101 100 10 50 44 103 114 101 101 110 10 51 44 114 101 100 10
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2,green ↩ 3,red ↩ [EOF] 83 105 122 101 44 67 111 108 111 114 10 49 44 103 114 101 101 110 10 50 44 114 101 100 10 50 44 103 114 101 101 110 10 51 44 114 101 100 10
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2, green ↩ 3,red ↩ [EOF] 83 105 122 101 44 67 111 108 111 114 10 49 44 103 114 101 101 110 10 50 44 114 101 100 10 50 44 103 114 101 101 110 10 51 44 114 101 100 10
A CSV file Size,Color ↩ 1,green ↩ 2,red ↩ 2, green ↩ 3,red ↩ [EOF] 83 105 122 101 44 67 111 108 111 114 10 49 44 103 114 101 101 110 10 50 44 114 101 100 10 50 44 103 114 101 101 110 10 51 44 114 101 100 10 What is the ASCII code for "f" ? What letter corresponds to ASCII 100 ?
A tab-delimited file Size Color 1 green 2 red 2 green 3 red
A CSV file Size\tColor\n1\tgreen\n2\tred\n2\tgreen\n3\tred\n[EOF]
A tab-delimited file Size ▶ Color ↩ 1 ▶ green ↩ 2 ▶ red ↩ 2 ▶ green ↩ 3 ▶ red ↩ [EOF]
A tab-delimited file Size ▶ Color ↩ 1 ▶ green ↩ 2 ▶ red ↩ 2 ▶ green ↩ 3 ▶ red ↩ [EOF] 83 105 122 101 9 67 111 108 111 114 10 49 9 103 114 101 101 110 10 50 9 114 101 100 10 50 9 103 114 101 101 110 10 51 9 114 101 100 10
Is .ipynb (Jupyter notebook) a binary or text format? Answer: it's a text file in JSON format. JSON allows us to represent nested data using lists [1, 2, 3] and dictionaries {"a": 1, "b": "hello"} . CSV can only represent row/column tables. Each cell is represented by a dictionary that records the type of cell, the source code for the cell, and the output if any. Image data is represented in binary, but in base64 encoding. This uses 64 displayable characters (a-z, A-Z, 0-9, some punctuation), which can represent six bits. It's slightly inefficient (only six of eight bits are useful), but safe to transmit as text.
How to turn in a valid homework Answer all the questions (except discussion) Save your notebook with all cells executed Save your notebook in a .zip file
Which sequence of die rolls is more likely? A [5, 1, 5, 1, 4, 5, 6, 3, 4, 1] B [1, 1, 1, 3, 4, 4, 5, 5, 5, 6] C [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Recommend
More recommend