TrapperKeeper: The Case for Using Virtualization to Add Type Awareness to File Systems Daniel Peek Jason Flinn Facebook University of Michgan
Trapper Keeper • Need access to type-specific metadata – Searching, Organization, Presentation • Extracting metadata is hard – Lots of file types out there – Custom code required for each type • A better way to get metadata University of Michigan 2
The Plug-in Solution • Developers make plug-ins for each type Metadata-Using Application Query: MP3s with Composer = Mozart Metadata Engine JPEG Plug-in MP3 Plug-in … University of Michigan 3
The Plug-in Solution • Lots of work for developers Mac OS X Spotlight Metadata Preview University of Michigan 4
The Long Tail • How big is this problem? • Big (Agrawal et al.) • Uneconomical to support all types University of Michigan 5
The TrapperKeeper Solution • Already have apps that parse these files – Apps expose information through GUI Date Time Original = 2007:11:22 19:21:14 University of Michigan 6
The TrapperKeeper Process • Once Per Application – Trap the application • Once Per File – Use trapped application to parse the file – Capture displayed output metadata preview file … University of Michigan 7
Trapping Applications • Run app inside a VM – Contains app effects • Make app open a dummy file • Snapshot at moment of open() – About to execute file parsing behavior Dummy University of Michigan 8
Parsing with Trapped Apps • Restart VM • Switch files File To Dummy Parse University of Michigan 9
Accessibility Window Tab Pane … Text Label: “Date Time Original” Text: “2007:11:22 19:21:14” … University of Michigan 10
Accessibility Window Tab Pane … Text Label: “Date Time Original” Text: “2007:11:22 19:21:14” … University of Michigan 11
Guided Extraction University of Michigan 12
Guided Extraction University of Michigan 13
Guided Extraction University of Michigan 14
Execute Features • Snapshot window in VM metadata Metadata System Preview University of Michigan 15
TrapperKeeper Results • Makes it easy to extract metadata – No development skill – No source code – Just be able to use the application • Successful use – All GUI apps in Ubuntu 7.10 in a day – Parses over 100 file types – Rate of 318 files/hour University of Michigan 16
Tricky Situations • Application has no accessibility support • Application does not expose metadata • Application needs external info to parse – Configuration files – License servers – Internet connections University of Michigan 17
Tricky Situations • Performance: a sudden influx of files – Fresh installation – Download from digital camera • Which metadata is the right metadata? University of Michigan 18
Questions University of Michigan 19
Recommend
More recommend