by raoufehsadat hashemian
play

by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference - PowerPoint PPT Presentation

R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linkping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013 OUTLINE


  1. R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linköping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013

  2. OUTLINE • Introduction • Scalability Evaluation • Scalability Enhancement Approach • Validation • Conclusion 2 ICPE13 Improving the Scalability of a Multi-core Web Server

  3. INTRODUCTION PROBLEM DESCRIPTION Enterprise applications • • Performance: Improving QoS • e.g. Lower response times • Cost: Less money spent on hardware • e.g. Improving effective utilization 40 Response Time 30 20 10 0 0 50 100 CPU Utilization (%) Goal: Higher utilization and acceptable response time • Multi-core technology • 3 ICPE13 Improving the Scalability of a Multi-core Web Server

  4. INTRODUCTION BACKGROUND • Web servers before multi-core • Mature topic, wide-ranging discussions • Multi-core architecture • Most research on batch (non-interactive) workload • Web servers running on Multi-core • BUS problem in UMA system ( Veal et al. `07 ) • Multiple Web server instances: 1 Instance per processor (Scogland et al.` 09, Boyd et.al ,10 Gaud et. al ,11) 5 ICPE13 Improving the Scalability of a Multi-core Web Server

  5. SCALABILITY EVALUATION SCALABILITY MEASUREMENT • Measure Web server scalability for two workloads • Evaluate the effectiveness of multiple Web server approach in scalability • Scalability • Maximum Achievable Throughput (MAT) 4 ICPE13 Improving the Scalability of a Multi-core Web Server

  6. SCALABILITY EVALUATION EXPERIMENTAL SETUP • 2 x 4 core Intel Xeon E5620 processors NUMA Architecture Processor 0 Processor 1 Microarch. Nehalem C C C C C C C C Frequency 2.4 GHz 0 2 4 6 0 2 4 6 L1 Cache 32K IC - 32K DC L1 L1 L1 L1 L1 L1 L1 L1 L2 Cache 256K L2 L2 L2 L2 L2 L2 L2 L2 L3 Cache 12M (Inclusive) L3 L3 Inter-conn. QPI -5.86 GT/s Memory 16GB - DDR3-1333 Memory Memory Bank 0 Bank 1 • OS: Linux, kernel 3, Ubuntu • Webserver: Lighttpd • Application Server: php (FastCGI module) 6 ICPE13 Improving the Scalability of a Multi-core Web Server

  7. SCALABILITY EVALUATION WORKLOADS • TCP/IP Intensive workload • High TCP connection rate • Processing: low user level & high kernel level • 1 KB static file, up to 155,000 requests/second • SPECweb Support workload • Both static requests and php requests • Wider range of request types • Processing: high user level & moderate kernel level 7 ICPE13 Improving the Scalability of a Multi-core Web Server

  8. SCALABILITY EVALUATION CONFIGURATION TUNING • 1 Lighttpd worker process per core • Disabling scheduler effect (use affinity) • Distributing interrupt handling load • Improved MAT up to 69% • Balanced utilization levels for the eight cores • Fully utilize the server 8 ICPE13 Improving the Scalability of a Multi-core Web Server

  9. SCALABILITY EVALUATION RESULTS • TCP/IP Intensive workload Scalability • Sub-linear Maximum Achievable Throughput 146,000 req/sec • SPECweb Support workload Scalability • Almost linear Maximum Achievable Throughput 23,000 req/sec Number of Cores 9 ICPE13 Improving the Scalability of a Multi-core Web Server

  10. SCALABILITY EVALUATION RESPONSE DISTRIBUTION ANALYSIS “Low response time” requests  increased response times  Static • “High response time” requests  decreased response time  Dynamic • 1 0.8 CDF of Response times P [ X <= x ] 0.6 80% CPU Utilization SPECweb Support Workload 0.4 0.2 1 Core 2 Core 4 Core 8 Core 0 -1 0 1 2 3 10 10 10 10 10 x = Response time (msec) 10 ICPE13 Improving the Scalability of a Multi-core Web Server

  11. SCALABILITY ENHANCEMENT MULTIPLE WEBSITE REPLICAS • Approach: Using one Web server instance per processor • Goal: Reduce inter-processor data migration Single Replica Process Replica 1 Process NIC 1 Queue NIC1 Queue NIC 2 Queue Replica 2 Process NIC 2 Queue Processor 0 Processor 1 Processor 0 Processor 1 NIC NIC NIC NIC 1 2 1 2 Original Configuration Alternative Configuration with one replica with two replicas 11 ICPE13 Improving the Scalability of a Multi-core Web Server

  12. SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION Response time (ms) Response time (ms) Request rate (req/sec) Request rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload  Scalability Degradation  Scalability improvement  MAT decrement: 10%  MAT increment: 12.3% 12 ICPE13 Improving the Scalability of a Multi-core Web Server

  13. SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION • The response time inflation for Dynamic requests dominates the improvement achieved for Static requests • Mean and 99.9th percentile response times increase with 2-replicas 1 P [X <= x] CDF of Response times 80% CPU Utilization 0.95 22,000 req/sec SPECweb Support workload 0.9 0 2 4 10 10 10 Response Time (ms) 13 ICPE13 Improving the Scalability of a Multi-core Web Server

  14. VALIDATION INTER-CONNECT TRAFFIC • Inter-connect traffic decreased significantly for TCP/IP intensive workload • For Support workload, the change is not significant Inter-connect Traffic Inter-connect Traffic (Bytes/sec) (Bytes/sec) Request Rate (req/sec) Request Rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload 14 ICPE13 Improving the Scalability of a Multi-core Web Server

  15. VALIDATION LAST LEVEL CACHE • Last Level cache HIT ratio degrades with 2-replica configuration L3 Cache HIT Ratio Request Rate (req/sec) SPECweb Support Workload 15 ICPE13 Improving the Scalability of a Multi-core Web Server

  16. CONCLUSIONS • Multi-core Web server: scalable after tuning • Multiple Website Replicas • The effect on the scalability was workload dependent • LLC contention caused by php application • The result may be architecture dependent • The result may be application dependent • Future plan: • An automatic, workload adaptive approach to decide about best configuration 16 ICPE13 Improving the Scalability of a Multi-core Web Server

  17. Raoufeh Hashemian University of Calgary, Canada rhashem@ucalgary.ca This work is financially supported by: 17 ICPE13 Improving the Scalability of a Multi-core Web Server

  18. REFERENCES • Cherkasova et al.`00: Characterizing Temporal Locality and its Impact on Web Server Performance, International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs • Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM • Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo; Gross; ETH • Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU • Veal et al.`07: Performance scalability of a multi-core web server. A CM/IEEE ANCS ’07,Veal; Foong; Intel • Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy, • Boyd et.al ` 10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements; Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT • Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RR- LIG-011, Gaud; Lachaize; Lepers; Muller; Quema. -2 ICPE13 Improving the Scalability of a Multi-core Web Server

  19. SCALABILITY EVALUATION CONFIGURATION TUNING Network interrupt handling • • 4 RSS queue per NIC port • Each queue bind to one core 2.0 Before Distributing Int. Load Response time (msec) After Distributing Int. Load 1.5 1.0 0.5 0.0 0 50,000 100,000 150,000 200,000 Rate (req/sec) -3 ICPE13 Improving the Scalability of a Multi-core Web Server

  20. SCALABILITY EVALUATION CONFIGURATION TUNING OS scheduling • • Binding each lighttpd process to 1 core 2 No Affinity 1.8 Response time (msec) 1.6 With affinity 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 50000 100000 150000 200000 Rate (req/sec) -4 ICPE13 Improving the Scalability of a Multi-core Web Server

  21. SCALABILITY EVALUATION WEB TIER VS. APPLICATION TIER • Static: Requests with lower response time • Processed only in Web tier (lighttpd) • Dynamic: Requests with higher response time • Processed only in Web and application tiers (lighttpd and php) Response time (ms) File size (Byte) -5 ICPE13 Improving the Scalability of a Multi-core Web Server

  22. SCALABILITY EVALUATION EXPERIMENTAL SETUP -6 ICPE13 Improving the Scalability of a Multi-core Web Server

More recommend