R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linköping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013
OUTLINE • Introduction • Scalability Evaluation • Scalability Enhancement Approach • Validation • Conclusion 2 ICPE13 Improving the Scalability of a Multi-core Web Server
INTRODUCTION PROBLEM DESCRIPTION • Enterprise applications • Performance: Improving QoS • e.g. Lower response times • Cost: Less money spent on hardware • e.g. Improving effective utilization 40 Response Time 30 20 10 0 0 50 100 CPU Utilization (%) • Goal: Higher utilization and acceptable response time • How to achieve this “Goal” for Web servers running on Multi - core hardware? 3 ICPE13 Improving the Scalability of a Multi-core Web Server
INTRODUCTION BACKGROUND • Web servers before multi-core • Mature topic, wide-ranging discussions • Multi-core architecture • Most research on batch (non-interactive) workload • Web servers running on Multi-core • BUS problem in UMA system ( Veal et al. `07 ) • Multiple Web server instances: 1 instance per processor (Scogland et al.` 09, Boyd et.al ,10 Gaud et. al ,11) 5 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION SCALABILITY MEASUREMENT • Measure Web server scalability for two workloads • Evaluate the effectiveness of multiple Web server approach in the server’s scalability • Scalability • Maximum Achievable Throughput (MAT) 4 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION EXPERIMENTAL SETUP • 2 x 4 core Intel Xeon E5620 processors NUMA Architecture Processor 0 Processor 1 Microarch. Nehalem C C C C C C C C Frequency 2.4 GHz 0 2 4 6 0 2 4 6 L1 Cache 32K IC - 32K DC L1 L1 L1 L1 L1 L1 L1 L1 L2 Cache 256K L2 L2 L2 L2 L2 L2 L2 L2 L3 Cache 12M (Inclusive) L3 L3 Inter-conn. QPI -5.86 GT/s Memory 16GB - DDR3-1333 Memory Memory Bank 0 Bank 1 • OS: Linux, kernel 3, Ubuntu • Webserver: Lighttpd • Application Server: php (FastCGI module) 6 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION WORKLOADS • TCP/IP Intensive workload • High TCP connection rate • Processing: low user level & high kernel level • 1 KB static file, up to 155,000 requests/second • SPECweb Support workload • Both static requests and php requests • Wider range of request types • Processing: high user level & moderate kernel level 7 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION CONFIGURATION TUNING • Change default lighttpd recommendation (1 Lighttpd worker process per core) • Disable default Linux scheduling (use affinity) • Distribute interrupt handling load • Improved MAT up to 69% • Balanced utilization levels for the eight cores • Fully utilized the server 8 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION RESULTS • TCP/IP Intensive workload Scalability • Sub-linear Maximum Achievable Throughput 146,000 req/sec • SPECweb Support workload Scalability • Almost linear Maximum Achievable Throughput 23,000 req/sec Number of Cores 9 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION RESPONSE DISTRIBUTION ANALYSIS 1 Response time vs. Core Count 0.8 • “Low response time” requests 1 P [ X <= x ] • Static requests 0.98 0.6 • 0.96 Performance degrades • “High response time” requests 0.4 0.94 0.92 • Dynamic requests 0.2 0.9 • Performance improves 0 2 10 10 1 Core 2 Core 4 Core 8 Core 0 -1 0 1 2 3 10 10 10 10 10 x = Response time (msec) Knowing this behavior, how can we improve the scalability? CDF of Response times 80% CPU Utilization SPECweb Support Workload 10 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY ENHANCEMENT MULTIPLE WEBSITE REPLICAS • Approach: Use 1 Web server instance per processor • Goal: Reduce inter-processor data migration Single Replica Process Replica 1 Process NIC 1 Queue NIC1 Queue NIC 2 Queue Replica 2 Process NIC 2 Queue Processor 0 Processor 1 Processor 0 Processor 1 NIC NIC NIC NIC 1 2 1 2 Original Configuration Alternative Configuration with one replica with two replicas 11 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION Response time (ms) Response time (ms) Request rate (req/sec) Request rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload Scalability Degradation Scalability Improvement MAT decrement: 10% MAT increment: 12.3% 12 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION • The response time inflation for Dynamic requests dominates the improvement achieved for Static requests • Mean and 99.9th percentile response times increase with 2-replicas • Hypothesis: • Cache contention with 2- 1 replicas due to the larger P [X <= x] working set size of dynamic requests 0.95 CDF of Response times 0.9 0 2 4 10 10 10 80% CPU Utilization 22,000 req/sec SPECweb Support workload Response Time (ms) 13 ICPE13 Improving the Scalability of a Multi-core Web Server
VALIDATION INTER-CONNECT TRAFFIC • • Inter-connect traffic decreased No significant decrement • significantly Improved performance for Static • Improved performance requests Inter-connect Traffic Inter-connect Traffic (Bytes/sec) (Bytes/sec) Request Rate (req/sec) Request Rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload 14 ICPE13 Improving the Scalability of a Multi-core Web Server
VALIDATION LAST LEVEL CACHE • Last Level cache (LLC) HIT ratio degrades with 2-replica configuration Confirms the cache contention hypothesis L3 Cache HIT Ratio Request Rate (req/sec) SPECweb Support Workload 15 ICPE13 Improving the Scalability of a Multi-core Web Server
CONCLUSIONS • Multi-core Web server: scalable after tuning • 80% utilization with acceptable response time • Multiple Website Replicas • The effect on the scalability is workload dependent • Dynamic requests trigger LLC contention • Contention may be architecture and application dependent • Future plan: • Design and develop an automatic, workload adaptive technique which decides about best configuration 16 ICPE13 Improving the Scalability of a Multi-core Web Server
Raoufeh Hashemian University of Calgary, Canada rhashem@ucalgary.ca This work is financially supported by: 17 ICPE13 Improving the Scalability of a Multi-core Web Server
REFERENCES • Cherkasova et al.`00: Characterizing Temporal Locality and its Impact on Web Server Performance, International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs • Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM • Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo; Gross; ETH • Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU • Veal et al.`07: Performance scalability of a multi-core web server. A CM/IEEE ANCS ’ 07,Veal; Foong; Intel • Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy, • Boyd et.al ` 10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements; Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT • Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RR- LIG-011, Gaud; Lachaize; Lepers; Muller; Quema. -2 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION CONFIGURATION TUNING • Network interrupt handling • 4 RSS queue per NIC port • Each queue bind to one core 2.0 Before Distributing Int. Load Response time (msec) After Distributing Int. Load 1.5 1.0 0.5 0.0 0 50,000 100,000 150,000 200,000 Rate (req/sec) -3 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION CONFIGURATION TUNING • OS scheduling • Binding each lighttpd process to 1 core 2 No Affinity 1.8 Response time (msec) 1.6 With affinity 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 50000 100000 150000 200000 Rate (req/sec) -4 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION WEB TIER VS. APPLICATION TIER • Static: Requests with lower response time • Processed only in Web tier (lighttpd) • Dynamic: Requests with higher response time • Processed only in Web and application tiers (lighttpd and php) Response time (ms) File size (Byte) -5 ICPE13 Improving the Scalability of a Multi-core Web Server
SCALABILITY EVALUATION EXPERIMENTAL SETUP -6 ICPE13 Improving the Scalability of a Multi-core Web Server
Recommend
More recommend