5/15/2017 File Systems: Semantics & Structure What is a File • a file is a named collection of information 11A. File Semantics • primary roles of file system: 11B. Namespace Semantics – to store and retrieve data 11C. File Representation – to manage the media/space where data is stored 11D. Free Space Representation • typical operations: 11E. Namespace Representation – where is the first block of this file 11L. Disk Partitioning – where is the next block of this file – where is block 35 of this file 11F. File System Integration – allocate a new block to the end of this file – free all blocks associated with this file File Systems Semantics and Structure 1 File Systems Semantics and Structure 2 Data and Metadata Sequential Byte Stream Access int infd = open(“abc”, O_RDONLY); • File systems deal with two kinds of information int outfd = open(“xyz”, O_WRONLY+O_CREATE, 0666); • Data – the contents of the file if (infd >= 0 && outfd >= 0) { – e.g. instructions of the program, words in the letter int count = read(infd, buf, sizeof buf); • Metadata – Information about the file while( count > 0 ) { – e.g. how many bytes are there, when was it created write(outfd, buf, count); – sometimes called attributes count = read(infd, inbuf, BUFSIZE); • both must be persisted and protected } – stored and connected by the file system close(infd); close(outfd); } File Systems Semantics and Structure 3 File Systems Semantics and Structure 4 Random Access Consistency Model void *readSection(int fd, struct hdr *index, int section) { • When do new readers see results of a write? struct hdr *head = &hdr[section]; – read-after-write off_t offset = head->section_offset; size_t len = head->section_length; • as soon as possible, data-base semantics void *buf = malloc(len); • this commonly called “POSIX consistency” if (buf != NULL) { lseek(fd, offset, SEEK_SET); – read-after-close (or sync/commit) if ( read(fd, buf, len) <= 0) { • only after writes are committed to storage free(buf); – open-after-close (or sync/commit) buf = NULL; } • each open sees a consistent snapshot } – explicitly versioned files return(buf); } • each open sees a named, consistent snapshot File Systems Semantics and Structure 5 File Systems Semantics and Structure 6 1
5/15/2017 File Attributes – basic properties Extended File Types and Attributes • thus far we have focused on a simple model • extended protection information – a file is a "named collection of data blocks" – e.g. access control lists • in most OS files have more state than this • resource forks – file type (regular file, directory, device, IPC port, ...) – e.g. configuration data, fonts, related objects – file length (may be excess space at end of last block)) • application defined types – ownership and protection information – e.g. load modules, HTML, e-mail, MPEG, ... – system attributes (e.g. hidden, archive) • application defined properties – creation time, modification time, last accessed time – e.g. compression scheme, encryption algorithm, ... • typically stored in file descriptor structure File Systems Semantics and Structure 7 File Systems Semantics and Structure 8 Databases Object Stores • simplified file systems, cloud storage • a tool managing business critical data – optimized for large but infrequent transfers • table is equivalent of a file system • bucket is equivalent of a file system • data organized in rows and columns – a bucket contains named, versioned objects – row indexed by unique key • objects have long names in a flat name space – columns are named fields within each row – object names are unique within a bucket • support a rich set of operations • an object is a blob of immutable bytes – get … all or part of the object – multi-object, read/modify/write transactions – put … new version, there is no append/update – SQL searches return consistent snapshots – delete – insert/delete row/column operations File Systems Semantics and Structure 9 File Systems Semantics and Structure 10 Key-Value Stores File Names and Name Binding • smaller and faster than an SQL database • file system knows files by internal descriptors – optimized for frequent small transfers • users know files by names • table is equivalent of a file system – names more easily remembered than disk addresses – names can be structured to organize millions of files – a table is a collection of key/value pairs • file system responsible for name-to-file mapping • keys have long names in a flat name space – associating names with new files – key names are unique within a table – changing names associated with existing files • value is a (typically 64-64MB) string – allowing users to search the name space – get/put (entire value) • there are many ways to structure a name space – delete File Systems Semantics and Structure 11 File Systems Semantics and Structure 12 2
5/15/2017 What is in a Name? Flat Name Spaces directory • there is one naming context per file system /home/mark/TODO.txt – all file names must be unique within that context suffix • all files have exactly one true name separator base name – these names are probably very long • suffixes and file types • file names may have some structure – file-to-application binding often based on suffix – e.g. CAC101.CS111.SECTION1.SLIDES.LECTURE_13 • defined by system configuration registry – this structure may be used to optimize searches • configured per user, or per directory – the structure is very useful to users – suffix may define the file type (e.g. Windows) – the structure has no meaning to the file system – suffix may only be a hint (magic # defines type) File Systems Semantics and Structure 13 File Systems Semantics and Structure 14 A rooted directory tree Hierarchical Namespaces • directory root – a file containing references to other files – it can be used as a naming context user_1 user_2 user_3 • each process has a current working directory • names are interpreted relative to directory file_a dir_a file_b file_c dir_a • nested directories can form a tree (/user_1/file_a) (/user_1/dir_a) (/user_2/file_b) (/user_3/file_c) (/user_3/dir_a) – file name is a path through that tree file_a file_b – directory tree expands from a root node (/user_1/dir_a/file_a) (/user_3/dir_a/file_b) • fully qualified names begin from the root – may actually form a directed graph File Systems Semantics and Structure 15 File Systems Semantics and Structure 16 Hard Links: example True Names vs. Path Names • Some file systems have “true names” root • DOS and ISO9660 have a single “path name” user_1 user_3 – files are described by directory entries – data is referred to by exactly one directory entry dir_a file_c file_a – each file has only one (character string) name ln /user_3/dir_a/file_b /user_1/file_a file_b • Unix (and Linux) … have named links – files are described by I-nodes (w/unique I#) – directories associate names with I-node numbers Both names now refer to the same I-node – many directory entries can refer to same I-node File Systems Semantics and Structure 17 File Systems Semantics and Structure 18 3
Recommend
More recommend