vxa a virtual architecture for durable compressed archives
play

VXA : A Virtual Architecture for Durable Compressed Archives Bryan - PowerPoint PPT Presentation

VXA : A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/ The Ubiquity of Data Compression


  1. VXA : A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/

  2. The Ubiquity of Data Compression Everything is compressed these days – Archive/Backup/Distribution: ZIP, tar.gz, ... – Multimedia streams: mp3, ogg, wmv, ... – Office documents: XML-in-ZIP – Digital cameras: JPEG, proprietary RAW, ... – Video camcorders: DV, MPEG-2, ...

  3. Compressed Data Formats Observation #1: Data compression formats evolve rapidly s s e r c p 2 C O r R a m p p P R A U O H Q i i o I z z z A R S L Z L Z c g b 7 — — — — — — — — — — — Lossless Compression 1980 1985 1990 1995 2000 2005

  4. Compressed Data Formats Observation #1: Data compression formats evolve rapidly s s e r c p 2 C O r R a m p p P R A U O H Q i i o I z z z A R S L Z L Z c g b 7 — — — — — — — — — — — Lossless Compression 0 0 0 2 M G G P F A X G M B F F E E G C N I L I P P G B T P T P I J J — — — — — — — — — e Image Encoding m n 1 2 4 i o T - - - 7 8 9 G G G s M V V V k n C E E E c e M M M I I i N P P V P r u L o W W W M M M A Q D F S — — — — — — — — — — — Video Encoding o i d C u 7 9 s X - A A C A V i F F 3 C b l M A M V A F F P a r A o e L S I I W M W W A A A V R F 8 — — — — — — — — — — — Audio Encoding 1980 1985 1990 1995 2000 2005

  5. Compressed Data Formats Observation #1: Data compression formats evolve rapidly Problems: – Inconvenient: each new algorithm requires decoder install/upgrade – Impedes data portability: data unusable on systems without supported decoder – Threatens long-term data usability: old decoders may not run on new operating systems

  6. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE x86 Architecture 1980 1985 1990 1995 2000 2005

  7. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Fully Backward Compatible Extensions (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE x86 Architecture 1980 1985 1990 1995 2000 2005

  8. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  9. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  10. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  11. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  12. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005 Itanic

  13. VXA: Virtual Executable Archives Observation 1+2: Instruction formats are historically more durable than compressed data formats Make archive self-extracting (data + executable decoder) To extract data, archive reader runs embedded decoder Archive Archive Archive Writer Reader Encoder Decoder D D

  14. Goals of VXA Make self-extracting archives... Archive Archive Archive Writer Reader Encoder Decoder D D

  15. Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can't compromise host 2. Future-proof: simple, well-defined architecture [Lorie] Archive Archive Archive Writer Reader Encoder Emulator Decoder D D

  16. Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can't compromise host 2. Future-proof: simple, well-defined architecture [Lorie] 3. Easy: allow reuse of existing code, languages, tools Archive Archive Archive Writer Reader Encoder x86 Emulator Decoder D D

  17. Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can't compromise host 2. Future-proof: simple, well-defined architecture [Lorie] 3. Easy: allow reuse of existing code, languages, tools 4. Efficient: practical for short term data packaging too Archive Archive Archive Writer Reader Encoder Fast x86 Emulator Decoder D D

  18. Outline ● Archiver Operation ● vxZIP Archive Format ● Decoder Architecture ● Emulator Design & Implementation ● Evaluation (performance, storage overhead) ● Conclusion

  19. Archive Writer Operation VXA Archiver Archive

  20. Archive Writer Operation Uncompressed Input Files VXA Archiver General Compressor Decoder 1 D 1 Archive

  21. Archive Writer Operation Uncompressed Input Files VXA Archiver General Compressor Decoder 1 D 1 Archive

  22. Archive Writer Operation Uncompressed Input Files General Image Audio Compressor Compressor Compressor Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 Archive

  23. Archive Writer Operation Uncompressed Input Files Pre-Compressed Input Files General Image Audio Compressor Compressor Compressor Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 Archive

  24. Archive Writer Operation Uncompressed Input Files Pre-Compressed Input Files General Image Audio Image Format Audio Format Compressor Compressor Compressor Recognizer Recognizer Decoder 1 Decoder 2 Decoder 3 Decoder 4 Decoder 5 D 1 D 2 D 3 D 4 D 5 Archive

  25. Archive Reader Operation VXA Archive Reader x86 Emulator D 1 D 2 D 3 D 4 D 5 Archive

  26. Archive Reader Operation Original Uncompressed Files VXA Archive Reader x86 Emulator Decoder 1 D 1 D 2 D 3 D 4 D 5 Archive

  27. Archive Reader Operation Original Uncompressed Files VXA Archive Reader x86 Emulator Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 D 4 D 5 Archive

  28. Archive Reader Operation Original Uncompressed Files Original Pre-Compressed Files VXA Archive Reader x86 Emulator Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 D 4 D 5 Archive

  29. Archive Reader Operation Original Uncompressed Files De-compressed Files VXA Archive Reader x86 Emulator Decoder 1 Decoder 2 Decoder 3 Decoder 4 Decoder 5 D 1 D 2 D 3 D 4 D 5 Archive

  30. vxZIP Archive Format ● Backward compatible with legacy ZIP format Image file Audio file Audio file Central Directory vxZIP Archive

  31. vxZIP Archive Format ● Backward compatible JP2 Decoder with legacy ZIP format Image file ● Decoders intermixed FLAC Decoder with archived files Audio file Audio file Central Directory vxZIP Archive

  32. vxZIP Archive Format ● Backward compatible JP2 Decoder with legacy ZIP format Image file (JP2-encoded) ● Decoders intermixed FLAC Decoder with archived files Audio file ● Archived files have (FLAC-encoded) Audio file new extension header (FLAC-encoded) pointing to decoder Central Directory vxZIP Archive

  33. vxZIP Archive Format ● Backward compatible JP2 Decoder (deflated) with legacy ZIP format Image file (JP2-encoded) ● Decoders intermixed FLAC Decoder with archived files (deflated) Audio file ● Archived files have (FLAC-encoded) Audio file new extension header (FLAC-encoded) pointing to decoder Central Directory ● Decoders are hidden, vxZIP Archive “deflated” (gzip)

  34. vxZIP Decoder Architecture ● Decoders are ELF executables for x86-32 – Can be written in any language, safe or unsafe – Compiled using ordinary tools (GCC) ● Decoders have access to five “system calls”: – read stdin, write stdout, malloc, next file, exit ● Decoders cannot : – open files, windows, devices, network connections, ... – get system info: user name, current time, OS type, ...

Recommend


More recommend