Fakultät für Informatik Technische Universität München Popularity over time – Analysis of Videos on YouTube Tizian Sarre Advisor(s): Dr. Heiko Niedermayer Supervisor: Prof. Dr.-Ing. Georg Carle Chair of Network Architectures and Services Department of Informatics Technical University of Munich (TUM)
Outline YouTube Platform Motivation Dataset Modeling Conclusion Future work References 2
The Platform • 2 nd biggest website worldwide • Over 1 billion users • More than 4 billion daily views • More than 300 hours of newly uploaded videos every minute 3
Motivation Huge amounts of data • Storage costs • Networking costs Video popularity analysis • Better understanding user behavior • Modeling video views • (Network performance improvement) 4
Dataset Overview Measured for 6 months (October 16th 2015 - April 14th 2016) YouTube metrics • 59328 static video information • 8909 unavailable videos • 58594 measured videos Facebook metrics • About YouTube videos from dataset 5
Dataset (YouTube) YouTube metrics Static video information • Title • Duration • Description • Published • … Unavailable videos • Time Dynamic video measurements • Views • Likes • Dislikes • Comments 6
YouTube Analysis -> Metrics Correlation YouTube views strongly positively correlate with video likes 7
YouTube Analysis -> Search Ranking Benefits Keywords, description, etc. barely correlate positively with views 8
YouTube Analysis -> Unavailable Videos Most video taken down early Still, life time median surprisingly high (23) 9
Dataset (Facebook) Facebook metrics Dynamic YouTube video measurements • Likes • Comments • Shares • Totals 10
Facebook Analysis: Influence on YouTube Facebook shares correlates positively with YouTube video views Linear regression line “curved” due to logarithmic scale 11
Modeling -> Top Days Certain videos more popular than others When and how often do the highest views gains (top days) occur? 12
Modeling -> Event Days When and how often do distinctly popular days (event days) occur? How exactly are event days defined? Desired properties: • Independence of future data • Outstanding popularity • Independence of video age • Event day occurrence independence • Popularity only dependence 13
Modeling -> Event Days Different event day definition attempts have been made • Absolute median • Popularity categories • Power law-based • Varying event models 14
Modeling -> Event Days -> Absolute Median Idea: Calculate daily views gains medians of all videos in the dataset for each day in their lifetimes Use daily medians as event day decider 15
Modeling -> Event Days -> Absolute Median Why not use the average? Too heavily influenced by higher values Represents reality less appropriately Daily views gains medians of the dataset: 16
Modeling -> Event Days -> Absolute Median How do we use the daily views gains medians to derive event days? We could use the exact medians as decider Better: Add arbitrary positive modifier to accomplish outstanding popularity 17
Modeling -> Event Days -> Absolute Median Observation Lots of event days 18
Modeling -> Event Days -> Absolute Median Problem Daily views gains medians no good decider… Values too small • Need more fine-grained decider 19
Modeling -> Event Days -> Popularity categories Idea Classify videos dynamically according to popularity categories Smaller intervals for lower views gains (due to geometric views gains distribution) More realistic event day determination via interval median as decider 20
Modeling -> Event Days -> Popularity categories Problem Event day determination depends on interval choice Seemingly random event days Popularity only dependence violated 21
Modeling -> Event Days -> Power Law Idea Calculate a power law based model on the dataset’s views gains Strong positive deviations are event days 22
Modeling -> Event Days -> Power Law Results Slightly fluctuating but overall decreasing event day occurrence Event days less likely than with absolute medians 23
Modeling -> Event Days -> Power Law Positives: Relatively reasonable model • Negatives: Model changes are not considered Multiple models are not supported 24
Modeling -> Event Days -> Varying Event Models Idea Adjust current power law model when strong deviations 25
Modeling -> Event Days -> Varying Event Models Deviation weight using least squares? No good, high deviation between views gains too severely weighted 26
Modeling -> Event Days -> Varying Event Models Better with relative measure: Deviations weighted linearly 27
Modeling -> Event Days -> Varying Event Models Decider for model adaption? More/Less than 50% of expected value Further model adaptions less likely (consecutive event days less likely) 28
Modeling -> Event Days -> Varying Event Models Similar results compared to power law approach Still no multiple event models supported due to uncertainty 29
Conclusion We discussed various better and worse possible event day • definitions External popularity influence not considered due to uncertainty • Vague models for predictions • 30
Future Work Consider video recommendations for popularity analysis • YouTube internal and external search engine rank/hits • Consider YouTube channel subscriber base for video popularity • analysis Twitter/Snapchat/Instagram social media influence analysis • Alternate event day definitions and multiple models detection • support (e.g. with CUSUM) 31
Thank you 32
References http://www.alexa.com/topsites https://www.youtube.com/yt/press/en-GB/statistics.html https://cnet3.cbsistatic.com/hub/i/r/2013/11/11/c3fb1098-6de7- 11e3-913e- 14feb5ca9861/resize/570xauto/4ddbc82dd9df232db62c49b29192f 268/sandvine-2H13-NA-top10-peak_.png http://www.cisco.com/c/en/us/solutions/collateral/service- provider/visual-networking-index-vni/complete-white-paper-c11- 481360.html 33
Recommend
More recommend