Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua - - PowerPoint PPT Presentation

stores at t scale a tencent case stu tudy
SMART_READER_LITE
LIVE PREVIEW

Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua - - PowerPoint PPT Presentation

Demystify fying Cache Policies for Photo Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, Tianming Yang Huazhong University of Science and Technology Key Laboratory


slide-1
SLIDE 1

Demystify fying Cache Policies for Photo Stores at t Scale: A Tencent Case Stu tudy

Ke Zhou, Si Sun, Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, Tianming Yang

Huazhong University of Science and Technology Key Laboratory of Information Storage System Intelligent Cloud Storage Joint Research Center of HUST and Tencent Tencent Inc. Temple University Huanghuai University

1

slide-2
SLIDE 2

Outline

◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion

2

slide-3
SLIDE 3

Background

◼ More than 250 million photos uploaded in

QQphoto every day

◼ Total photo view per day approaches to 50 billion ◼ QQphoto faces critical challenges of dealing with

such huge mounts of photos

  • user experiences(needs lower latency)
  • backend storage burden(needs lower traffic)

3

slide-4
SLIDE 4

The photo cache architecture

4

slide-5
SLIDE 5

Upload and download

5

slide-6
SLIDE 6

Upload channel

◼ Directly write to backend storage

Users/Apps

Backend Storage System upload channel

physical photo: original photo and photos resized from original difference in format or specification

logical photo: a photo set that containing several physical photos sharing the same content

  • riginal photo: users upload

: resize mechanism

6

slide-7
SLIDE 7

Two-tier cache

Users/Apps

7

where we delve into

slide-8
SLIDE 8

Outside cache

  • 9-days logs
  • >5.8 billion requests
  • >801 million logical

photos

  • >1.5 billion physical

photos

  • total data size >46 TB
  • total network traffic

>186 TB

◼ Sampling based on logical photos

  • Extract all logical photos in logs
  • Random sampling the logical photos by 1:100
  • Extract logs containing the sampled logical photos

8

slide-9
SLIDE 9

Outline

◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion

9

slide-10
SLIDE 10

Advanced algorithms fail

◼ ARC, MQ, S3LRU are almost identical and show

negligible improvements over LRU

X is the cache capacity in production. Belady is theoretical optimal algorithms.

10

slide-11
SLIDE 11

Advanced algorithms fail

◼ Phenomenon:

  • Higher frequency photos contribute more to HR (hit ratio)
  • Or lower frequency photos are more difficult to hit

The CDFs of photo reuse distance grouped by photo frequency.

11

slide-12
SLIDE 12

Advanced algorithms fail

PoP: percentage of photos PoR: percentage of requests CoHR: contribution to hit ratio 𝐷𝑝𝐼𝑆 = 𝑏𝑑𝑑𝑓𝑡𝑡 𝑢𝑗𝑛𝑓𝑡 𝑗𝑜 𝑕𝑠𝑝𝑣𝑞 − 𝑜𝑣𝑛 𝑝𝑔 𝑞ℎ𝑝𝑢𝑝𝑡 𝑗𝑜 𝑕𝑠𝑝𝑣𝑞 𝑏𝑑𝑑𝑓𝑡𝑡 𝑢𝑗𝑛𝑓𝑡 𝑗𝑜 𝑢𝑠𝑏𝑑𝑓

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

PoP PoR CtoHR Percentage f = 1 f = 2 2 < f ≤ 5 5 < f ≤ 10 10 < f ≤ 100 100 < f ≤ 1000 1000 < f ≤ 10000 f > 10000

remove compulsory miss

12

slide-13
SLIDE 13

Hit ratio contribution breakdown

At cache capacity of X, HR(of LRU) is 67.9% At infinite cache capacity, HR(of Belady) is 76.82%

67.90% 76.82%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Hit ratio

CDF of CtoHR

LRU Belady

13

slide-14
SLIDE 14

67.90% 76.82%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Hit ratio

CDF of CtoHR

LRU Belady

Hit ratio contribution breakdown

◼ To improve hit ratio, low frequency photos must be hit ◼ Advanced algorithms do no optimization for low

frequency data thus they fail to improve

14

slide-15
SLIDE 15

Cache size is too large

◼ The cache capacity in production is 5TB!

Cache size is large enough to hold all uploaded data!

𝑏𝑤𝑓𝑠𝑏𝑕𝑓 𝑣𝑞𝑚𝑝𝑏𝑒 𝑞ℎ𝑝𝑢𝑝𝑡 𝑞𝑓𝑠 𝑒𝑏𝑧 = 𝑢𝑝𝑢𝑏𝑚 𝑒𝑏𝑢𝑏 𝑡𝑗𝑨𝑓 𝑜𝑣𝑛 𝑝𝑔 𝑒𝑏𝑧𝑡 = 46TB 9 ≈ 5.1TB

15

slide-16
SLIDE 16

Outline

◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion

16

slide-17
SLIDE 17

Motivation

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

PoP PoR CtoHR Percentage f = 1 f = 2 2 < f ≤ 5 5 < f ≤ 10 10 < f ≤ 100 100 < f ≤ 1000 1000 < f ≤ 10000 f > 10000

◼ “Cold” photos(𝑔𝑠𝑓𝑟 ≤ 5) accounts for the vast

majority. Can we leverage those cold photos? Yes, make compulsory miss hit!

17

slide-18
SLIDE 18

Immediacy

◼ Hint: “Immediacy” of social network

  • Recently uploaded photos are more likely to be requested by

users The CDF of interval between photos uploading time and their first request time.

18

slide-19
SLIDE 19

Immediacy

◼ More than 90% photos will be requested at least

  • ne time within 1 day following their uploading
  • If placing uploaded photos into cache in time, their

compulsory miss will be eliminated → Prefetching

  • The prefetching is very efficient.

19

slide-20
SLIDE 20

Prefetching

◼ Prefetching

  • If prefetching photos uploaded every 1 second, nearly all

compulsory miss of them will be eliminated

  • If prefetching every 10 min, about 73% compulsory miss
  • f them will be eliminated
  • ……

20

~73% >99% miss prefetching

slide-21
SLIDE 21

Outline

◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion

21

slide-22
SLIDE 22

Prefetch architecture

OC Request Request, pass to DC

Prefetcher Trigger periodically isolated module besides OC

22

slide-23
SLIDE 23

Which resolution to prefetch

◼ What to prefetch

  • Original photos are resized to varying physical photos

with various resolutions(𝑆𝑓𝑨1, 𝑆𝑓𝑨2, 𝑆𝑓𝑨3 …)

  • Prefetching the needed resolutions

We know what content (logical photos) the users need. But we do not know which resolutions(physical photos) they need!

A logical photo contains several physical photos

23

slide-24
SLIDE 24

Which resolution to prefetch

◼ Problem:

  • Which resolution being requested is unknown

◼ Intuition:

  • Prefetch more popular resolutions
  • The more resolutions prefetching, the higher chances of

eliminating compulsory miss

24

slide-25
SLIDE 25

Which resolution to prefetch

◼ If frequency of 𝑆𝑓𝑨1 > 𝑆𝑓𝑨2, 𝑆𝑓𝑨1 has higher

priority to be prefetched

◼ 𝑶𝑸𝑺(number of prefetching resolutions)

  • Control how many resolutions to be prefetched
  • E.g. 𝑂𝑄𝑆 = 2 indicates prefetching both 𝑆𝑓𝑨1 and 𝑆𝑓𝑨2

25

slide-26
SLIDE 26

When to prefetch

◼ When to prefetch

  • How long to perform a prefetching.

time uploading prefetching here?

  • r here?
  • r here?

26

slide-27
SLIDE 27

Prefetching Scheduling

◼ QQphoto service is 24x7 online service ◼ Prefetching should also be online

  • Triggered periodically: prefetching interval
  • On a prefetching, all photos uploaded during last period

should be prefetched

time trigger prefetching prefetching photos uploaded during this time

27

slide-28
SLIDE 28

Inserting to cache queue

◼ Inserted prefetched photos into cache queue the

same as general replaced photos

MRU LRU LRU queue Insert new photos Insert prefetched photos

28

slide-29
SLIDE 29

Outline

◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion

29

slide-30
SLIDE 30

Evaluation

◼Setup:

  • a simulator
  • replaying trace
  • warming up: first 5 days
  • collect statistics: last 4 days
  • evaluating FIFO, LRU, S3LRU, Belady(offline
  • ptimal)
  • prefetching:
  • NPR: 1-8
  • prefetching interval: 1 sec, 10 min, 1 hour

30

slide-31
SLIDE 31

Hit ratio-NPR impact

More NPRs rewards higher hit ratio

𝐵𝑚𝑕𝑝𝑠𝑗𝑢ℎ𝑛𝑡 = LRU, 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛𝑗𝑜

31

slide-32
SLIDE 32

Hit ratio-prefetch interval impact

Lower prefetch interval rewards higher hit ratio

𝐵𝑚𝑕𝑝𝑠𝑗𝑢ℎ𝑛𝑡 = LRU, 𝑂𝑄𝑆 = 3, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 1𝑡, 10𝑛, 1ℎ

exceed Belady

32

slide-33
SLIDE 33

Latency

𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛

Latency ∝ Hit ratio 🙃 Higher NPR →higher HR →lower latency

33

slide-34
SLIDE 34

Network Traffic

𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛

🙂 Increase of NPRs result in huge growth of network traffic.

34

slide-35
SLIDE 35

Latency and Network Traffic Trade-off

◼ Best NPR?

  • 🙃 : lower latency
  • 🙂 : more network traffic

35

  • 50%

0% 50% 100% 150% 200% 1 2 3 4 5 6 7 8

Percentage

NPR Network traffic and latency trade-offs at cache capacity of X Network traffic Latency

best choices reduce latency by 6.9% consumes 4.14% extra network resources.

slide-36
SLIDE 36

Resolution Popularity Evolution

Distribution of resolution popularity keeps stationary.

36

slide-37
SLIDE 37

Optimal Prefetch Interval

◼ Low interval is good

  • 🙃 Be conducive to promote hit ratio
  • 🙂 Indicates frequent prefetching
  • affect online caching service

◼ No consistently optimal interval on time-varying

workload

  • max hit ratio loss should not exceed 1%
  • bias between actual interval and real time
  • 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 turns out to be a appropriate solution

which hit ratio loss is 0.95%

37

slide-38
SLIDE 38

Outline

◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion

38

slide-39
SLIDE 39

Conclusion

◼ Large cache capacity results in failure of

improvement of advanced cache policies

◼ Social network exhibits “immediacy” ◼ Prefetching method leverages such “immediacy” to

improve hit ratio

  • Latency is cut by an average of 6.9% while sacrificing
  • nly 4.14% additional network cost.

39

slide-40
SLIDE 40

Thank you & Questions

40