Selective Early Request Termination Selective Early Request Termination for Busy Internet Services for Busy Internet Services Jingyu Zhou and Tao Yang Zhou and Tao Yang Jingyu Ask.com Ask.com University of California, Santa Barbara University of California, Santa Barbara
Multi- -tier Internet Services tier Internet Services Multi Index servers Index servers Query Query (partition 1) (partition 1) caches caches Firewall/ Firewall/ Web switch Web switch Local- Local -area area Index servers Index servers network network (partition 2) (partition 2) Query Query Frontends Frontends Index servers Index servers Doc servers (partition 3) Doc servers (partition 3)
Multi- -thread Programming thread Programming Multi Model for Request Processing Model for Request Processing • Multi-threaded service tier – E.g., Apache, IIS, BEA WebLogic, and Thread 1 Neptune Thread 3 Thread 2 Get request ? Process request Thread 3 Send result Thread N
Problem Statement Problem Statement • Service-level agreement – E.g., 99% requests within 1s • A QoS challenge to be met during – Flash-crowd type of high request rate – Size distribution shift: percentage of long requests increases
Motivating Example: Motivating Example: Size Distribution Shift Size Distribution Shift • Settings – 50 requests/ s 50 800 – Two types of Throughput Response time requests: 5ms and 500ms Mean Response Time (ms) Throughput (requests/s) 40 600 – Long requests vary from 0.1% to 10% • Results 30 400 – Significant throughput loss – Magnitude increase of 20 200 response time – Admission control alone isn’t enough 10 0 0 0 2 2 4 4 6 6 8 8 10 10 Percentage of Long Requests
Current Techniques Current Techniques • Admission control – Response time feedback (e.g., SEDA, Quorum) – Bounding request queue length (e.g., Neptune) – Policing TCP SYN packets (e.g., [ Voigt’01] • Adaptive service degradation – E.g., reduce image quality • Size-based scheduling – Only for static content – File size as estimator
SERT Idea & Challenges SERT Idea & Challenges • Idea – Request-aware: differentiate long and short requests – Early termination: abort long requests during overload • Challenges – Detect long/ short dynamic requests – Adaptive selection of termination threshold – Resource accounting for safety – Simplicity in programming
SERT Architecture SERT Architecture Threshold Request Controller Queue Set/Cancel Timer Timer & Thread Terminator Pool Terminate Resource Access I n v o k e Termination Resource Accounting Module Handler Resources ... Lock File Memory
Resource Accounting Resource Accounting • Targets a class of requests that are – Read-only – Stateless • Resources – Memory: track heaps and memory mapped areas – Locks: use an integer counter – Sockets & file descriptors
Threshold Controller Adjusts Threshold Controller Adjusts Termination Threshold Termination Threshold • Ideas – During light load allow execute longer: large threshold – During heavy load terminate earlier: small threshold – Load index p is throughput loss • Formula – Threshold= LB + F(p)× (UB-LB), where: timeout range is [ LB, UB] p < = LW ⎧ 1 ⎪ − ⎪ HW p α = ⎨ F ( p ) ( ) LW < p < HW − ⎪ HW LW ⎪ ⎩ 0 p > = HW
Implementation & Usage Implementation & Usage • Intercept GLIBC/ Pthread functions – Memory, Pthread locks, etc. • POSIX signal for terminations • Use sigsetjmp()/ siglongjmp() • Neptune middleware uses SERT APIs • Applications link the SERT library with no code changes
SERT APIs SERT APIs • Start timer thread and set signal type extern int SERT_init_timer(int signum); • Start & end of a request extern void SERT_start(); extern void SERT_end(); • Set timeout value and controller parameters extern void SERT_set_args(struct sert_arg * ); • Set the rollback point extern void SERT_register_rollbackpoint(void * );
A Pseudo- - code Example code Example A Pseudo void worker() { while (1) { Request * request = get_request(); jump_buf env; if (sigsetjmp(&env, 1) = = 0) { SERT_register_rollbackpoint(&env); } else { / * longjmp back, resources has already been deallocated * / continue; } SERT_start(); process_request(request); SERT_end(); send_result(request); } }
Experimental Settings Experimental Settings • Hardware – 9 dual PIII 1.4GHz machines – Each has 4 GB RAM, 10K RPM SCSI disk – Fast Ethernet • Applications from Ask.com – Index matching: find web pages containing key words; heavy-tailed; 2.1 GB warm data in memory – Ranking: rank page importance; exponential; in memory Ave. (ms) 90% (ms) Max. (ms) App. Index Match 23.6 46 2,732 Ranking 93 212 14,035
Size Distribution Shift Size Distribution Shift AC SERT Request Rate 80 ← Pattern shift begins (30s) Pattern shift ends (155s) → Throughput 60 40 20 0 0 20 40 60 80 100 120 140 160 180 Time (s) Response Time (s) 6 5 4 3 2 1 0 0 20 40 60 80 100 120 140 160 180 Time (s) • During shift, about 10% requests are 500+ ms • SERT – 209.1% higher throughput – 54.7% response time reduction
Ranking Service Evaluation Ranking Service Evaluation Underloaded Underloaded Overloaded Overloaded 300 10 800 70 Mean Response Time (ms) Mean Response Time (ms) Throughput Loss Percent AC AC Throughput Loss Percent 9 SERT SERT 700 60 250 8 600 50 7 200 6 500 40 5 400 30 150 4 3 300 20 100 2 200 10 1 50 0 100 0 0 0 50 50 100 100 100 100 120 120 140 140 160 160 180 180 200 200 Load (%) Load (%) Load (%) Load (%)
Evaluation of Threshold Evaluation of Threshold Controller for Ranking Service Controller for Ranking Service 70 800 15 Throughput Loss Percent 3.0 60 700 0.5 Response Time (ms) Adapt 50 600 40 500 30 400 20 300 10 200 0 100 80 100 120 140 160 180 200 80 100 120 140 160 180 200 Load (%) Load (%) • Adaptive controller vs. fixed threshold of 0.5s, 3.0s, 15s
Evaluation of Threshold Evaluation of Threshold Controller for Index Matching Controller for Index Matching 60 450 8.0 Throughput Loss Percent 3.0 Response Time (ms) 1.5 50 400 Adapt 40 350 30 300 20 250 10 0 200 100 150 200 100 150 200 Load (%) Load (%) • Adaptive controller vs. fixed threshold of 0.5s, 3.0s, 15s
Related Work Related Work • Real-time database systems [ Kuo’00,Lin’90,Shu’94] – Higher priority transaction aborts lower ones – UNDO/ REDO log for recovery • Recoverable memory libraries – Recoverable virtual memory [ Saty.’94] , Rio Vista [ Lowell’97] – Application modifications needed • Process checkpointing and rollback – Fault tolerance[ Li’90] , program replay [ Srinivasan’04] and debugging [ Qin’05]
Conclusions Conclusions • Contribution: an early termination scheme for busy Internet services – Dynamically select termination threshold – Safely terminate requests early – Provide API for multi-threaded services • Future work – Perform cooperative early-termination across different nodes and tiers
Questions? Questions?
CDF of Response Time during CDF of Response Time during Size Distribution Shift Size Distribution Shift E.g., completed within one second – SERT 81.7% – AC 45.3%
Recommend
More recommend