LESSONS & PITFALLS DATA AT SWEDEN'S TELEVISION Ismail Elouafiq
A wide spectrum of Apps
A wide spectrum of Apps Running on different platforms
A wide spectrum of Users STRATEGY ANALYSTS PRODUCT OWNERS
A wide spectrum of Users STRATEGY AUTHORS / ANALYSTS PRODUCT OWNERS DEVELOPERS EDITORS
tl;dr: Defining what to prioritise
tl;dr: Defining what to prioritise “ Data: I could be chasing an untamed ornithoid without cause.” ― Star Trek The Next Generation
tl;dr: Defining what to prioritise “ Data: I could be chasing an untamed ornithoid without cause.” ― Star Trek The Next Generation
tl;dr: Defining what to prioritise Experimenting and iterating in small increments Spoilers : how and why we now use protobuf, functional data engineering and ETL practices
tl;dr: Defining what to prioritise Experimenting and iterating in small increments Spoilers : how and why we now use protobuf, functional data engineering and ETL practices BLOCKCHAIN AI Deep reinforcement learning
tl;dr: Defining what to prioritise Experimenting and iterating in small increments
tl;dr: Defining what to prioritise Experimenting and iterating in small increments ismail.land/velocity
tl;dr: Defining what to prioritise Experimenting and iterating in small increments ismail.land/velocity
What events should you collect?
What events should you collect?
How many people read what we want to the article know per day
How many people read what we want to the article know per day click what we can scroll observe share
How many people read what we want to the article know per day click what we can scroll events observe share
How many people read what we want to the article know per day explicit model click what we can scroll events observe share
How many people read what we want to the article know per day explicit model click what we can scroll events observe share
let's start with views
If you could do anything with data ... What would you actually use for decision making
If you could do anything with data ... What would you actually use for decision making A/B tests... Hell yeah!
tl;dr: Defining what to prioritise Experimenting and iterating in small increments ismail.land/velocity
First we need to collect data 1 2 COLLECT INGEST SDK
1 2 COLLECT INGEST Event API SDK events
1 2 COLLECT INGEST Event API SDK publish events
1 2 COLLECT INGEST pub / sub Event API SDK publish events
2 INGEST pub / sub
2 2 INGEST STORE pub / sub
2 2 INGEST STORE pub / sub Events table
2 2 INGEST STORE pub / sub Events table judge - judi subscribe write
2 2 INGEST STORE pub / sub Events table judge - judi subscribe write
1 COLLECT 2 3 INGEST STORE
1 COLLECT 2 3 INGEST STORE {event_type: click } { eventType: click } {eventType: klick }
1 COLLECT 2 3 INGEST STORE {event_type: click } { eventType: click } {eventType: klick }
1 COLLECT 2 3 INGEST STORE More Issues Multiple teams / platforms => takes time to update the clients The schema is sent with every event Unclear types ( arbitrary memory allocation )
1 COLLECT 2 3 INGEST STORE More Issues Multiple teams / platforms => takes time to update the clients The schema is sent with every event Unclear types ( arbitrary memory allocation ) We know the schema on all levels we have a common model for the data.. how can we make use of that...
ENTER PROTOBUF Keepign a centralized Event Schema
ENTER PROTOBUF Keepign a centralized Event Schema person.proto
ENTER PROTOBUF Keepign a centralized Event Schema person.go compiler person.proto person.js person.f
ENTER PROTOBUF Keepign a centralized Event Schema person.js person.js Person Client Person Server binary serialize deserialize
ENTER PROTOBUF Keepign a centralized Event Schema 1 - Define the Schema As a . proto file event.proto
ENTER PROTOBUF Keepign a centralized Event Schema 1 - Define the Schema As a . proto file 2 - Publish libraries event.proto Publish using CI pipeline go, js, java, swift
ENTER PROTOBUF Keepign a centralized Event Schema 1 - Define the Schema As a . proto file 2 - Publish libraries event.proto Publish using CI pipeline go, js, 2 - Fetch java, Fetch in SDKs swift ( serialization ) Fetch in Judy ( deserialization ) Use to generate table
0 DEFINE 1 COLLECT 2 3 INGEST STORE
My work here is done!
Not really... Backward and forward compatibility Table changes Language agnostic but nor really Lack of support
The Data Pyramid Storage, transformation, monitoring Collection and ingestion
The Data Pyramid Storage, transformation, monitoring Collection and ingestion
The Data Pyramid Learn, Optimise, Experiment Metrics, aggregations, KPIs Storage, transformation, monitoring Collection and ingestion
The Data Pyramid Nirvana AI, machine learning Learn, Optimise, Experiment Metrics, aggregations, KPIs Storage, transformation, monitoring Collection and ingestion
The Data Pyramid "The pyramids of Egypt could be explained as symbolic stairways to the stars, according to a British scientist " _ The Guardian
The Data Pyramid "The pyramids of Egypt could be explained as symbolic stairways to the stars, according to a British scientist " _ The Guardian "The data pyramid could be explained as a symbolic stairway to the A.I., according to myself " _ Me
Endorse me on Linkedin
We have the data Now what? 0 DEFINE 1 COLLECT 2 3 INGEST STORE
We have the data Now what? 0 DEFINE 1 COLLECT 2 3 INGEST STORE Present Analyze Service / API Batch jobs etl Dashboard Streaming Reports 4 5
We have the data Now what? 0 DEFINE 1 COLLECT 2 3 INGEST STORE Present Analyze Service / API Batch jobs etl Dashboard Streaming Reports 4 5
Everybody ETLs
Everybody ETLs
Inputs Output Some data to be aggregated Aggregated Table Our mysterious job pipeline ( article reads ) Per DAY click events article reads per day article titles
today - partition Append magic job
today - partition Failed magic job
Immutable data Principle : Ensuring partitions reproducibility Versioned logic today - partition Append magic job
On ETL design Ensure reproducibility Practice failure in small increments Defining conventions in one place
keeping a tidy pipeline ISMAIL . LAND / VELOCITY
¯ \ _( ツ )_/¯ Br3Ak ' em rULeS
summary...
summary...
summary...
summary... (what worked for us)
summary... (what worked for us) DATA DATA DATA
Thank You ismail.land/velocity
Recommend
More recommend