FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu 1001541261
● A cluster architecture for low-power and data-intensive computing. ● Wimpy nodes = A combination of low-power CPUs and small flash. ○ design centers around log-structured datastores that provide high performance on flash. ● Goal of the architecture?? ○ Increase performance while minimizing power consumption -- Save the electricity bills of the Data Centers! ● How performance is measured? ○ This paper uses queries per Joule as metric. FAWN handles roughly 350 k-v qpJ. 2
The above photo is taken from: http://www.cs.cmu.edu/~fawnproj/ 3
Trade-offs of using Flash: ● Flash provides a non-volatile memory store with several significant benefits over traditional magnetic disks: ○ Fast random reads. ○ Efficient power consumption for I/O ● But it also introduces challenges: ○ Small writes on flash are very expensive. ○ Updating a single page requires first erasing the entire block of pages and writing the entire modified block. 4
Log-structured datastore ● An append-only file system. ● Writes are appended to a sequential log Data log. ● Reads require a single random access 5
Q1) “The workloads these systems support share several characteristics, they are: - I/O, not computation, intensive, - requiring random access over large datasets, - and the size of objects stored is typically small. ” Why workloads of these characteristics represent a challenge to the system design? 6
Ans - Q1) ● Increasing gap between CPU performance - I/O bandwidth. ● "For data-intensive workloads storage, network, and memory bandwidth bottleneck often cause low CPU utilization." ● The "Small-write problem." - Multiple random disk writes(very slow). 7
Q2) “The key design choice in FAWN-KV is the use of a log structured per-node datastore called FAWN-DS that provides high performance reads and writes using flash memory. ” “These performance problems motivate log-structured techniques for flash filesystems and data structures” What key benefit does a log structured data organization bring to the KV store design? 8
Ans - Q2) ● get() = Random read. ● While, put() and delete() = Append. ● log-structured design = append only filesystem. ● Hence, using a log-structured data store prevents small random writes on disk. 9
Q3) “To provide this property, FAWN-DS maintains an in-DRAM hash table (Hash Index) that maps keys to an offset in the append-only Data Log on flash. ” 10
Ans - Q3) ● Large metadata - long buckets(nodes) and multiple pointers for each node(Linked List). ● RAM is volatile - in case of failure, the whole Hash Table is will be lost! 11
Q4) “It stores only a fragment of the actual key in memory to find a location in the log; ” Is there concern on correctness of this design? 12
Ans - Q4) ● What if multiple keys have have the fragment part that is similar? ○ Reads the full key from the log and verifies it with the key it read. ○ Therefore, no worries about the correctness. 13
Q5) Explain "Basic functions:" Store, Lookup, Delete 14
Ans - Q5) ● Store: ○ appends entry log updates the corresponding hash table entry. ● Lookup: ○ gets offset from hash entry and indexes into Data log, and returns the data blob ● Delete: ○ invalidates the hash entry by clearing the valid flag. ○ appends Delete entry to the log. ● Why append delete? - Discussed in the answer to the next question. Figure copied from http://vijay.vasu.org/static/talks/fawn-sosp2009-slides.pdf 15
Q6) “As an optimization, FAWN-DS periodically checkpoints the index by writing the Hash Index and a pointer to the last log entry to flash. ” Why does this checkpointing help with the recovery efficiency? Why is a Delete entry needed in the log for a correct recovery? 16
Ans - Q6) ● How check point helps with recovery efficient? ○ After a failure only the contents starting from the checkpoint are necessary to create the Hash Index. ● Why the Delete entry? ○ Fault tolerance. ○ Avoid random writes to disks. 17
Thank you 18
Recommend
More recommend