#NetflixData How “Stranger Things” can happen with Visual Analytics Jason Flittner Senior Analytics Engineer / Manager Netflix - Content Data Engineering and Analytics
● About Netflix ● Tableau + Big Data ○ Lessons Learned ○ Where we are today ● Analytics and Iterating Quickly
What is Netflix?
● 93+ million members Metrics ● 190 countries ● 1,000+ devices ● 10B hours/qtr We plan on spending ~$6B in 2017 on content for our members
● ~60 PB DW on S3 ● ~1400 Tableau users ● Live & extract connections ● Analytics on billions of rows
Compute Storage Data Interface Data Access, Analytics and Visualization AWS (Hadoop S3 clusters)
● About Netflix ● Tableau + Big Data ○ Lessons Learned ○ Where we are today ● Analytics and Iterating Quickly
Choosing a source Hive ● Spark ● Presto ● Redshift ● Published Data Source ● etc... ●
● Powerful and scalable backend ● “Slower” 1,000,000,000/hr ● Hive + Tableau ○ Thrift Servers ○ Custom SQL vs Tables ○ Metadata ○ ODBC Optimization
● Scalable ● Faster than Hive in many cases ● Spark + Tableau ○ Thrift Servers ○ Long running job on Cluster ○ Query reliability
● Fast query engine ● Great for experimenting and “smaller” data sets ● Connecting to Tableau ○ Web data connector ○ ODBC
● About Netflix ● Tableau + Big Data ○ Lessons Learned ○ Where we are today ● Analytics and Iterating Quickly
Tableau Extract API Tableau Data Extract Publish to Server
Distributed Tableau Extract API Publish to Server Issues Command Create Extract Provision Container Resource Create Tableau Data Extract
Amazon ● Very fast loads from S3 Redshift ● Native Tableau connector ● Quick Tableau Iteration ● Live or Extract ● Concurrency
● Too big to extract? BIG Data ● Optimized live connections ○ SQL ● Custom data viz with Druid ● Tableau + Hyper!?
● About Netflix ● Tableau + Big Data ○ Lessons Learned ○ Where we are today ● Analytics and Iterating Quickly
Analytics Engineer Analytics: Binge Analysis ● Viewing Patterns ● Hours Viewed ● Customer Joy ● Content Quality ● Business users
● Content analytics Bringing it all ● Iterate quickly together ● Move between backend sources ● Strong user adoption
Merci Thank you Jason Flittner -
Recommend
More recommend