metrics and models for web performance evaluation
play

Metrics and models for Web performance evaluation or, How to measure - PowerPoint PPT Presentation

Metrics and models for Web performance evaluation or, How to measure SpeedIndex from raw encrypted packets, and why it matters QoE = f(QoS) This talk Dario Rossi dario.rossi@huawei.com Director, DataCom (*) Lab Brussels, Feb 1 st 2020 Huawei


  1. Metrics and models for Web performance evaluation or, How to measure SpeedIndex from raw encrypted packets, and why it matters QoE = f(QoS) This talk Dario Rossi dario.rossi@huawei.com Director, DataCom (*) Lab Brussels, Feb 1 st 2020 Huawei (*) Data Communication Network Algorithm and Measurement Technology Laboratory

  2. Metrics and models for Web performance evaluation or, How to measure SpeedIndex from raw encrypted packets, and why it matters QoE = f(QoS) This talk Dario Rossi and, in alphabetical order, Alemnew Asrese, Alexis Huet, Diego Da Hora, Enrico Bocchi, Flavia Salutari, Florian Metzger, Gilles Dubuc, Hao Shi, Jinchun Xu, Luca De Cicco, Marco Mellia, Matteo Varvello, Renata Teixeira, Tobias Hossfeld, Shengming Cai, Vassillis Christophides, Zied Ben Houidi

  3. Internet ISP Offering G d user Q E is a common goal

  4. Internet ISP User App Net QoE QoS QoS For ISPs/vendors encryption makes the inference harder Detect/forecast/prevent Q oE degradation is important!

  5. Quality at different layers Context Context influence factors User Human influence User QoE factors influences Application Application QoS System influence factors affects Network Network QoS 31/01/2020 5

  6. Quality at different layers Context Device type Activity Location User User PLT Mean opinion (uPLT) User QoE score (MOS) Engagement influences metrics Application Page load time (PLT) Application QoS Video bitrate SpeedIndex affects Network Packet loss Latency Network QoS Bandwidth Wi-Fi quality 31/01/2020 6

  7. Quality at different layers Context Device type Activity Location User User PLT Mean opinion (uPLT) User QoE score (MOS) influences Application HTTP/2, QUIC… Application QoS (true for any other apps) affects Network Packet loss Latency Network QoS Bandwidth Wi-Fi quality 31/01/2020 7

  8. Metrics and models for Web performance evaluation Agenda or, How to measure SpeedIndex from raw encrypted packets, and why it matters Context User Data collection User QoE (Crowdsourcing campaign) Models (Data driven vs Expert Models) Application Application QoS Browser metrics (Instant vs Integral vs Compound) Models (From raw encrypted packets) Network Network QoS 31/01/2020 8

  9. Metrics and models for Web performance evaluation Agenda or, How to measure SpeedIndex from raw encrypted packets, and why it matters 1. Data collection User QoE (Crowdsourcing campaign) 2. Models (Data driven vs Expert Models) 3. Browser metrics (Instant vs Integral vs Compound) 4. Method (From raw encrypted packets) Network QoS 31/01/2020 9

  10. Data collection: Crowdsourcing campaigns https://webqoe.telecom-paristech.fr/data  Mean opinion score (MOS) Lab experiments (Award winning) “Rate your experience from Small user diversity, volounteers dataset • [PAM18] Web browsing, but artificial websites • 1-poor to 5- excellent” Artificial controlled conditions •  User perceived PLT (uPLT) Crowdsourcing (payed crowdworkers) Ongoing, with “Which of these two pages Larger userbase, but higher noise • Side-to- side videos ≠ Web browsing! • finished first ?” Artificial controlled conditions •  User acceptance Collab with Experiments from operational website “Did the page load fast Actual service users • Browsing in typical user conditions enough ?”(Yes/No) • Huge heterogeneity (devices/browsers/nets) • [WWW19] 10

  11. Models: Data driven vs Expert models UserQoE https://webqoe.telecom-paristech.fr/models y Learn y=f(x) Fit predetermined y=f(x) x=vector of input features x=single scalar metric, generally Page Load Time (PLT) optimal f(.) selected & tuned by machine learning f(.) = pre-selected by the expert IQX Hypotesis More flexible and (slightly) more accurate [PAM18] [1] M. Fiedler et al. "A generic quantitative relationship between quality of experience and quality of service." IEEE Network , 2010 Comparison of the two models in [QoMEX-18] [INFOCOM19] Weber Fechner Standard ITU-T G1030 Still room for improvement (see [WWW19] ) https://www.itu.int/rec/T-REC-G.1030/en 11

  12. Browser metrics: Time Instant vs Time Integral (1/2) t=DOM, page structure loaded www.iceberg.com 1 x(t) Visual Progress TB t DOM ATF PLT 12 * Images by vvstudio, vectorpocket, Ydlabs / Freepik

  13. Browser metrics: Time Instant vs Time Integral (1/2) t=ATF, visible portion (aka Above the Fold) loaded www.iceberg.com 1 Visual Progress t ATF DOM PLT 13 * Images by vvstudio, vectorpocket, Ydlabs / Freepik

  14. Browser metrics: Time Instant vs Time Integral (1/2) t=ATF, visible portion (aka Above the Fold) loaded www.iceberg.com 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) t ATF DOM PLT 14 * Images by vvstudio, vectorpocket, Ydlabs / Freepik

  15. Browser metrics: Time Instant vs Time Integral (1/2) t=ATF, visible portion (aka Above the Fold) loaded www.iceberg.com 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) t ATF DOM PLT 15 * Images by vvstudio, vectorpocket, Ydlabs / Freepik

  16. Browser metrics: Time Instant vs Time Integral (1/2) t=PLT, all page content loaded www.iceberg.com 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) t DOM ATF PLT 16 * Images by vvstudio, vectorpocket, Ydlabs / Freepik

  17. Browser metrics: Time Instant vs Time Integral (2/2)  SpeedIndex, RUMSI, PSSI › Processing intensive › Only at L7 (in browser) › Visual progress metric  ObjectIndex, ByteIndex and ImageIndex › Lightweight › ByteIndex also at L3 (in network) › Higly correlated with SpeedIndex SpeedIndex ? ImageIndex › Possibly far from user QoE ? %of visual ObjectIndex % of bytes ByteIndex completeness % of objects % of bytes of images (histogram, downloaded downloaded rectangles or SSim) 31/01/2020 17

  18. Browser metrics: Time Instant vs Time Integral (2/2)  SpeedIndex, RUMSI, PSSI › Processing intensive x’(t ) › Only at L7 (in browser) › Visual progress metric Same PLT but slower loading  ObjectIndex, ByteIndex and ImageIndex › Lightweight › ByteIndex also at L3 (in network) › Higly correlated with SpeedIndex › Possibly far from user QoE ? ? SpeedIndex ImageIndex %of visual ObjectIndex % of bytes ByteIndex completeness % of objects % of bytes of images (histogram, downloaded downloaded rectangles or SSim) 31/01/2020 18

  19. Browser metrics: Time Instant vs Time Integral (2/2)  SpeedIndex, RUMSI, PSSI › Processing intensive › Only at L7 (in browser) › Visual progress metric  ObjectIndex, ByteIndex and ImageIndex Different cutoffs › Lightweight › ByteIndex also at L3 (in network) › Higly correlated with SpeedIndex › Possibly far from user QoE ? ? SpeedIndex ImageIndex %of visual ObjectIndex % of bytes ByteIndex completeness % of objects % of bytes of images (histogram, downloaded downloaded rectangles or SSim) 31/01/2020 19

  20. Method: From raw packets to browser metrics (1/2) Single session 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) img1 t Individual DOM ATF PLT css img2 objects js htm Domain x.com 20

  21. Method: From raw packets to browser metrics (1/2) 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) img1 t Individual DOM ATF PLT css img2 objects js htm Domain x.com Packets ??! !?! time Single session 21

  22. Method: From raw packets to browser metrics (1/2) Single session 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) img1 t Individual DOM ATF PLT css img2 objects js htm Domain x.com Train ML models (XGBoost , 1D-CNN) Packets ??! !?! time Single session 22

  23. 23

  24. Method: From raw packets to browser metrics (2/2) User Browser (L7) Network (L3) Works with encryption Handle multi-sessions (not in this talk) Exact online algorithm for ByteIndex Machine learning for any metric Accurate on joint tests with Orange Accurate for unseen pages & networks Available soon into Huawei products 24

  25. Aftermath (1/3): From raw packets to rough sentiments  Expert-driven feature engineering  Neural Networks › Explainable but inherently heuristic approach › Less interpretable but more versatile › Hard to keep in sync with application/network change › Downside: requires lots of samples.... › User feedback (e.g. MOS, user PLT, etc.) › Feed NN with x(t) signal › Smartphone sensors (eg happiness › Still lightweight estimation via facial recognition) Possible inputs Possible outputs › Feed NN using a filmstrip › Brain signals acquired with sensors › More complex › Activity of brain areas correlated with user happiness 25

  26. Aftermath (2/3): Divide et impera  World Wild Web One average model › Huge diversity, not captured by single model  Increase accuracy › Per-page QoE models › Inherently non scalable Many per-page models  Increase accuracy & scalability › Per-page QoE models (eg Alexa top 100 pages) Alexa Top 1M, 100 clusters › Aggregate QoE models (eg 100 clusters top 1M) › Generic QoE model (for the tail up to 1B pages) 26

Recommend


More recommend