by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference - PowerPoint PPT Presentation

R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linköping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013

OUTLINE • Introduction • Scalability Evaluation • Scalability Enhancement Approach • Validation • Conclusion 2 ICPE13 Improving the Scalability of a Multi-core Web Server

INTRODUCTION PROBLEM DESCRIPTION Enterprise applications • • Performance: Improving QoS • e.g. Lower response times • Cost: Less money spent on hardware • e.g. Improving effective utilization 40 Response Time 30 20 10 0 0 50 100 CPU Utilization (%) Goal: Higher utilization and acceptable response time • Multi-core technology • 3 ICPE13 Improving the Scalability of a Multi-core Web Server

INTRODUCTION BACKGROUND • Web servers before multi-core • Mature topic, wide-ranging discussions • Multi-core architecture • Most research on batch (non-interactive) workload • Web servers running on Multi-core • BUS problem in UMA system ( Veal et al. `07 ) • Multiple Web server instances: 1 Instance per processor (Scogland et al.` 09, Boyd et.al ,10 Gaud et. al ,11) 5 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION SCALABILITY MEASUREMENT • Measure Web server scalability for two workloads • Evaluate the effectiveness of multiple Web server approach in scalability • Scalability • Maximum Achievable Throughput (MAT) 4 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION EXPERIMENTAL SETUP • 2 x 4 core Intel Xeon E5620 processors NUMA Architecture Processor 0 Processor 1 Microarch. Nehalem C C C C C C C C Frequency 2.4 GHz 0 2 4 6 0 2 4 6 L1 Cache 32K IC - 32K DC L1 L1 L1 L1 L1 L1 L1 L1 L2 Cache 256K L2 L2 L2 L2 L2 L2 L2 L2 L3 Cache 12M (Inclusive) L3 L3 Inter-conn. QPI -5.86 GT/s Memory 16GB - DDR3-1333 Memory Memory Bank 0 Bank 1 • OS: Linux, kernel 3, Ubuntu • Webserver: Lighttpd • Application Server: php (FastCGI module) 6 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION WORKLOADS • TCP/IP Intensive workload • High TCP connection rate • Processing: low user level & high kernel level • 1 KB static file, up to 155,000 requests/second • SPECweb Support workload • Both static requests and php requests • Wider range of request types • Processing: high user level & moderate kernel level 7 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION CONFIGURATION TUNING • 1 Lighttpd worker process per core • Disabling scheduler effect (use affinity) • Distributing interrupt handling load • Improved MAT up to 69% • Balanced utilization levels for the eight cores • Fully utilize the server 8 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION RESULTS • TCP/IP Intensive workload Scalability • Sub-linear Maximum Achievable Throughput 146,000 req/sec • SPECweb Support workload Scalability • Almost linear Maximum Achievable Throughput 23,000 req/sec Number of Cores 9 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION RESPONSE DISTRIBUTION ANALYSIS “Low response time” requests  increased response times  Static • “High response time” requests  decreased response time  Dynamic • 1 0.8 CDF of Response times P [ X <= x ] 0.6 80% CPU Utilization SPECweb Support Workload 0.4 0.2 1 Core 2 Core 4 Core 8 Core 0 -1 0 1 2 3 10 10 10 10 10 x = Response time (msec) 10 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY ENHANCEMENT MULTIPLE WEBSITE REPLICAS • Approach: Using one Web server instance per processor • Goal: Reduce inter-processor data migration Single Replica Process Replica 1 Process NIC 1 Queue NIC1 Queue NIC 2 Queue Replica 2 Process NIC 2 Queue Processor 0 Processor 1 Processor 0 Processor 1 NIC NIC NIC NIC 1 2 1 2 Original Configuration Alternative Configuration with one replica with two replicas 11 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION Response time (ms) Response time (ms) Request rate (req/sec) Request rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload  Scalability Degradation  Scalability improvement  MAT decrement: 10%  MAT increment: 12.3% 12 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION • The response time inflation for Dynamic requests dominates the improvement achieved for Static requests • Mean and 99.9th percentile response times increase with 2-replicas 1 P [X <= x] CDF of Response times 80% CPU Utilization 0.95 22,000 req/sec SPECweb Support workload 0.9 0 2 4 10 10 10 Response Time (ms) 13 ICPE13 Improving the Scalability of a Multi-core Web Server

VALIDATION INTER-CONNECT TRAFFIC • Inter-connect traffic decreased significantly for TCP/IP intensive workload • For Support workload, the change is not significant Inter-connect Traffic Inter-connect Traffic (Bytes/sec) (Bytes/sec) Request Rate (req/sec) Request Rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload 14 ICPE13 Improving the Scalability of a Multi-core Web Server

VALIDATION LAST LEVEL CACHE • Last Level cache HIT ratio degrades with 2-replica configuration L3 Cache HIT Ratio Request Rate (req/sec) SPECweb Support Workload 15 ICPE13 Improving the Scalability of a Multi-core Web Server

CONCLUSIONS • Multi-core Web server: scalable after tuning • Multiple Website Replicas • The effect on the scalability was workload dependent • LLC contention caused by php application • The result may be architecture dependent • The result may be application dependent • Future plan: • An automatic, workload adaptive approach to decide about best configuration 16 ICPE13 Improving the Scalability of a Multi-core Web Server

Raoufeh Hashemian University of Calgary, Canada rhashem@ucalgary.ca This work is financially supported by: 17 ICPE13 Improving the Scalability of a Multi-core Web Server

REFERENCES • Cherkasova et al.`00: Characterizing Temporal Locality and its Impact on Web Server Performance, International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs • Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM • Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo; Gross; ETH • Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU • Veal et al.`07: Performance scalability of a multi-core web server. A CM/IEEE ANCS ’07,Veal; Foong; Intel • Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy, • Boyd et.al ` 10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements; Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT • Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RR- LIG-011, Gaud; Lachaize; Lepers; Muller; Quema. -2 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION CONFIGURATION TUNING Network interrupt handling • • 4 RSS queue per NIC port • Each queue bind to one core 2.0 Before Distributing Int. Load Response time (msec) After Distributing Int. Load 1.5 1.0 0.5 0.0 0 50,000 100,000 150,000 200,000 Rate (req/sec) -3 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION CONFIGURATION TUNING OS scheduling • • Binding each lighttpd process to 1 core 2 No Affinity 1.8 Response time (msec) 1.6 With affinity 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 50000 100000 150000 200000 Rate (req/sec) -4 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION WEB TIER VS. APPLICATION TIER • Static: Requests with lower response time • Processed only in Web tier (lighttpd) • Dynamic: Requests with higher response time • Processed only in Web and application tiers (lighttpd and php) Response time (ms) File size (Byte) -5 ICPE13 Improving the Scalability of a Multi-core Web Server

SCALABILITY EVALUATION EXPERIMENTAL SETUP -6 ICPE13 Improving the Scalability of a Multi-core Web Server

by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference - PowerPoint PPT Presentation

R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linkping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013 OUTLINE

by : Raoufeh Hashemian R. Hashemian 1 , N. Carlsson 2 , D. Krishnamurthy 1 , M. Arlitt 1 1.

OUTLINE Introduction Scalability Evaluation Scalability Enhancement Approach

Jesus, grace and generosity Looking again at Luke Development Day, Saturday 7 October 2017

Large Pages May Be Harmful on NUMA Systems Fabien Gaud

July 12, 2020 You will need communion elements for todays gathering. Please join with audio to

Our Grandchildrens Water Ramont Bell September 22, 2018 Join the conversation!

Blue Bible pg 1091 Jesus First Words (Luke 2:49) Why were you looking for me? Did you not

What we read about in the Book of Mormon is the Nephite Disease and we have it! . . . We

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

REFERENCES TO MY SERVANT It has been well said that the Old T estament is the New T

Jesus Calls His First Disciples Matthew 4:17-22; 9:9-13 Here is some test text Here is some test

2 When John heard in prison what the Messiah 2 When John heard in

Harvest Thanksgiving Welcome & Prayer Welcome & Prayer The grace of our Lord Jesus

Fixing problems with grammars Informatics 2A: Lecture 13 John Longley School of Informatics

Extraordinary JUBILEE OF MERCY IUBILAEUM EXTRAORDINARIUM MISERICORDIAE Pope Francis Jubilee

Kieran OMahony, OSA Advent 2019, Sunday 3A Preparing for the coming one Kieran OMahony,

LIMITED BENEVOLENCE 1) What two kinds of action play an important role in this discussion? 2)

New Socks A Publishers Weekly Top 10 in Religion selection. This is nothing less than the

Lets Build Provable Multicore Schedulers! Redha GOUICEM Whisper team, Sorbonne Universits,

Because Your Worth It Growing Towards Godly Giving Today we are starting a 3 week series on giving

Je Jesus sus of of Na Naza zareth: reth: Myth or Messiah? What shall we say of this? It

Performance Evaluation of Throughput Constrained Dataflow Programs Executed On Shared-Memory

t rs

Thread and Memory Placement on NUMA Systems: Asymmetry Matters Baptiste Lepers, Alexandra

by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference - PowerPoint PPT Presentation

R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linkping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013 OUTLINE

by : Raoufeh Hashemian R. Hashemian 1 , N. Carlsson 2 , D. Krishnamurthy 1 , M. Arlitt 1 1.

OUTLINE Introduction Scalability Evaluation Scalability Enhancement Approach

Jesus, grace and generosity Looking again at Luke Development Day, Saturday 7 October 2017

Large Pages May Be Harmful on NUMA Systems Fabien Gaud

July 12, 2020 You will need communion elements for todays gathering. Please join with audio to

Our Grandchildrens Water Ramont Bell September 22, 2018 Join the conversation!

Blue Bible pg 1091 Jesus First Words (Luke 2:49) Why were you looking for me? Did you not

What we read about in the Book of Mormon is the Nephite Disease and we have it! . . . We

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

REFERENCES TO MY SERVANT It has been well said that the Old T estament is the New T

Jesus Calls His First Disciples Matthew 4:17-22; 9:9-13 Here is some test text Here is some test

2 When John heard in prison what the Messiah 2 When John heard in

Harvest Thanksgiving Welcome &amp; Prayer Welcome &amp; Prayer The grace of our Lord Jesus

Fixing problems with grammars Informatics 2A: Lecture 13 John Longley School of Informatics

Extraordinary JUBILEE OF MERCY IUBILAEUM EXTRAORDINARIUM MISERICORDIAE Pope Francis Jubilee

Kieran OMahony, OSA Advent 2019, Sunday 3A Preparing for the coming one Kieran OMahony,

LIMITED BENEVOLENCE 1) What two kinds of action play an important role in this discussion? 2)

New Socks A Publishers Weekly Top 10 in Religion selection. This is nothing less than the

Lets Build Provable Multicore Schedulers! Redha GOUICEM Whisper team, Sorbonne Universits,

Because Your Worth It Growing Towards Godly Giving Today we are starting a 3 week series on giving

Je Jesus sus of of Na Naza zareth: reth: Myth or Messiah? What shall we say of this? It

Performance Evaluation of Throughput Constrained Dataflow Programs Executed On Shared-Memory

t rs

Thread and Memory Placement on NUMA Systems: Asymmetry Matters Baptiste Lepers, Alexandra

Harvest Thanksgiving Welcome & Prayer Welcome & Prayer The grace of our Lord Jesus