FINDING A NEEDLE IN HAYSTACK, FACEBOOKS PHOTO STORAGE Based on: D. - PowerPoint PPT Presentation

Mar 24, 2023 •156 likes •397 views

FINDING A NEEDLE IN HAYSTACK, FACEBOOKS PHOTO STORAGE Based on: D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel: Finding a Needle in Haystack: Facebook's Photo Storage, in Proceedings USENIX OSDI 2010, Vancouver, Canada, October

FINDING A NEEDLE IN HAYSTACK, FACEBOOK’S PHOTO STORAGE Based on: D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel: “Finding a Needle in Haystack: Facebook's Photo Storage,” in Proceedings USENIX OSDI 2010, Vancouver, Canada, October 2010.
The problem  65 billion uploaded photos  260 billion images stored (each in 4 copies)  1 billion new photos uploaded each week (~ 60 TB of data) How to deal with that amount of data?
Requirements  High throughput and low latency  Fault-tolerance  Cost-effectiveness  Simplicity
Initial design  Photos stored as standard UNIX files  Requests made to Content Delivery Network (CDN) by the browser  Photos fetched from servers via NFS and delivered to end-user by CDN  Caching popular photos
Initial design overview
Photo’s popularity
NFS design drawbacks  While fetching less popular photos the system has to read the from disk  Potentially heavy overhead to find a proper inode (up to several IO operations)  IO operation for reading the inode And the user does not want to wait that long…
Improvements  Extending photos cache  Caching inodes in main memory These are however not effective, as there are too many inodes, which are heavy (for example xfs_inode_t is 536 bytes long)
Solution: Haystack  Store multiple photos in a single file  Arrange them ‘one after another’  Make the structure that holds photo’s metadata as small as possible  Keep these structures in main memory
Haystack design overview: Haystack Store  Each store machine manages multiple physical volumes  Each physical volume is assigned to a logical one (redundancy for fault tolerance)  Each physical volume is a large file (100 GB) that contains many photos  Built on top of XFS, every file descriptor opened all the time (but there are just a few files)
Reading a photo
Haystack store: file layout
Haystack store: needle’s metadata
Haystack store: index
Haystack store: index  Resides in main memory  After reboot can be computed, but this requires reading the hole disk  Is updated asynchronously  Possible data inconsistency after reboot is also handled 
Writing a photo
Haystack directory  Maps logical volumes to physical ones  Balances reads and writes across physical volumes  Determines how to handle a photo request  Marks volumes as ‘read - only’ when needed
Further optimizations  Deleting photos that users delete  (embedding deletion flag in ‘file offset’ field)  Batch upload of multiple photos
Evaluation: daily traffic
Evaluation: Read-Only Machines
Evaluation: Write-Enabled Machines
Evaluation  4 times more reads per second (at avarage) with Haystack than with ‘standard’ approach
Thank you, time for questions All graphs taken from the paper, data definition images taken from http://www.facebook.com/note.php?note_id=76191543919 Karol Strzelecki

Recommend

Finding the Needle in the Haystack Jonzy Data Security Analysis, Sr. Information Security

Finding the Needle in the Haystack Jonzy Data Security Analysis, Sr. Information Security Office Finding the Needle in the Haystack With all the information available via NetFlows, finding the "Needle in the Haystack" (the bad actor

243 views • 20 slides

SURGICAL NEEDLES Prepared by: Mana Basirat 0 ANATOMY OF A SURGICAL NEEDLE EYELESS NEEDLE

SURGICAL NEEDLES Prepared by: Mana Basirat 0 ANATOMY OF A SURGICAL NEEDLE EYELESS NEEDLE Chord Length Needle Point Swage Needle Radius Diameter Flatted Area Needle Length USE OF NEEDLE HOLDERS NEEDLE SHAPES 1 / 4 Circle 3 / 8

128 views • 11 slides

Picviz finding a needle in a haystack Sbastien Tricaud INL Usenix, San Diego 2008 Sbastien

Picviz finding a needle in a haystack Sbastien Tricaud INL Usenix, San Diego 2008 Sbastien Tricaud (INL) Picviz finding a needle in a haystack Usenix, San Diego 2008 1 / 47 Speaker: Sebastien Tricaud I Live and work in Paris (FR)

601 views • 48 slides

Ultra-High Angular Resolution VLBI Rusen Lu ( ) rslu@haystack.mit.edu MIT Haystack

Ultra-High Angular Resolution VLBI Rusen Lu ( ) rslu@haystack.mit.edu MIT Haystack Observatory Tuesday, May 20, 14 Ultra-High Angular Resolution VLBI enabled by mm-VLBI Rusen Lu ( ) rslu@haystack.mit.edu MIT Haystack

231 views • 19 slides

Haystack full of needles. Beaver, D., Kumar, S., Li, H.C., Sobel, J., and Vajgel, P.: Finding

Haystack full of needles. Beaver, D., Kumar, S., Li, H.C., Sobel, J., and Vajgel, P.: Finding a Needle in Haystack: Facebook's Photo Storage, in Proceedings USENIX OSDI 2010. Introduction 65 billion photos (in 4 copies) 260

295 views • 28 slides

Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook stores an enormous amount

Finding a needle in Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook stores an enormous amount of data: 260 billion images 20 petabytes of data Traditional filesystems perform poorly under their workload

550 views • 21 slides

Finding a Needle in Haystack Presentation by: Neelim Haider Authors (of paper): Doug Beaver,

Finding a Needle in Haystack Presentation by: Neelim Haider Authors (of paper): Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, Peter Vajgel Question 1: : Please briefly introduce the Haystacks architecture. Haystack consists of 3

377 views • 12 slides

Serving Photos at Scaaale : Caching and Storage An Analysis of Facebook Photo Caching. Huang et

Serving Photos at Scaaale : Caching and Storage An Analysis of Facebook Photo Caching. Huang et al. Finding a Needle in a Haystack. Beaver et al. Vlad Niculae for CS6410 Most slides from Qi Huang (SOSP 2013) and Peter Vajgel (OSDI 2010) Dynamic

670 views • 49 slides

Early Detection of Aquatic Invasive Species finding the needle in the haystack Jim Grazio,

Early Detection of Aquatic Invasive Species finding the needle in the haystack Jim Grazio, Ph.D. PA DEP- Office of the Great Lakes 19 March 2019 Presentation Outline Share current AIS monitoring research Discuss regional AIS

623 views • 24 slides

Finding the Needle in a Haystack: Materials discovery through

Finding the Needle in a Haystack: Materials discovery through high-throughput ab ini;o compu;ng and data mining Geoffroy Hau+er Max conference January

508 views • 40 slides

Finding a Needle in the Haystack of Hardened Interconnect Patterns S. Nikoli, G. Zgheib*, and

Finding a Needle in the Haystack of Hardened Interconnect Patterns S. Nikoli, G. Zgheib*, and P. Ienne FPL19, Barcelona, 09.09.2019 cole Polytechnique Fdrale de Lausanne *Intel Corporation Why harden connections? 2 crossbar LUT LUT

626 views • 44 slides

Finding Camoufmaged Needle in a Haystack? Pornographic Products Detection via Berrypicking Tree

. Zhuoren Jiang 3 . . . . . . . Finding Camoufmaged Needle in a Haystack? Pornographic Products Detection via Berrypicking Tree Model Guoxiu He 1,2 Yangyang Kang 2 Zhe Gao 2 Changlong Sun 2 . Xiaozhong Liu *4,2 Wei Lu *1 Qiong Zhang 2 Luo

489 views • 19 slides

Data Acquisition and Event Filtering Problem: finding the needle in the haystack total

Data Acquisition and Event Filtering Problem: finding the needle in the haystack total inelastic cross section 50 mb Interesting physics most channels < 100 nb 2x10 6 interactions /s Data rate after FEE

277 views • 14 slides

Configuring Debugging as Search: Finding the Needle in the Haystack Andrew Whitaker, Richard S.

Configuring Debugging as Search: Finding the Needle in the Haystack Andrew Whitaker, Richard S. Cox and Steven D. Gribble. University of Washington Divya Muthukumaran Some slides borrowed from Aditya Y.S.V Whats the big picture? Can we

305 views • 30 slides

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows programmatic buying of Facebook ad inventory through real-time bidding. Agencies, trading desks and direct marketers can buy Facebook ads using

343 views • 7 slides

M87 Avery E. Broderick Sheperd Doeleman (MIT Haystack) Avi Loeb (Harvard) Vincent Fish (MIT

VLBI observations of jets in M87 Avery E. Broderick Sheperd Doeleman (MIT Haystack) Avi Loeb (Harvard) Vincent Fish (MIT Haystack) Ramesh Narayan (Harvard) Alan Rogers (MIT Haystack) Mark Reid (Smithsonian) Dimitrios Psaltis (Arizona)

556 views • 42 slides

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of Computer Science University of Texas at Austin mihir@cs.utexas.edu 10 November, 2017 1/34 Outline Motivation and related work Our approach

681 views • 34 slides

June 13, 2018 Disclaimer - Estimates are considered accurate based on data as of the date of

1 SOUTH WHIDBEY SCHOOL DISTRICT NO. 206 Revenue June 13, 2018 Disclaimer - Estimates are considered accurate based on data as of the date of publication. Final adjustments/modifications may be made up to the board certification date (July

1.12k views • 75 slides

Pima County Transportation Advisory Committee Regional Local Road Repair Program September 12,

Pima County Transportation Advisory Committee Regional Local Road Repair Program September 12, 2017 Initial Methodology Discussion 1. To keep roads from moving to failure, methodology reviews/includes PASER 5 roads only. 2. ADT was determined

174 views • 14 slides

File System Implementation Sunu Wibirama Thursday, December 16, 2010 Outline File-System

File System Implementation Sunu Wibirama Thursday, December 16, 2010 Outline File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Discussion 11. Thursday,

743 views • 40 slides

Board Self-Evaluation August 2, 2017 Grace Mah CSBA Board Self-Evaluation Objective A

Board Self-Evaluation August 2, 2017 Grace Mah CSBA Board Self-Evaluation Objective A governance team can strengthen or maintain its effectiveness by periodically assessing its own performance. A self assessment provides the

293 views • 7 slides

Screenshot Presentation Learning Management System Description (pt-br) Ivela uma aplicao

Screenshot Presentation Learning Management System Description (pt-br) Ivela uma aplicao de cdigo-aberto (GPL v2) para o Gerenciamento de Ensino Distncia (LMS Learning Management System). O nome Ivela significa na lngua

617 views • 32 slides

United States Court of Appeals for the Federal Circuit 05-1074, -1075, -1100 ON DEMAND MACHINE

United States Court of Appeals for the Federal Circuit 05-1074, -1075, -1100 ON DEMAND MACHINE CORPORATION, Plaintiff-Cross Appellant, v. INGRAM INDUSTRIES, INC. and LIGHTNING SOURCE, INC., Defendants-Appellants, and AMAZON.COM, INC.,

355 views • 23 slides

Midterm Presentation 3/25/2010 Progress to Date Future Work Functionality

Midterm Presentation 3/25/2010 Progress to Date Future Work Functionality Screenshots Discussed with Alex what he was looking for in the system. Broke the tasks down into Student and TA sides, and then further into phases.

402 views • 18 slides