Our solution to the IDAO 2020 qualifiers Max Halford Raphal Sourty - PowerPoint PPT Presentation

Our solution to the IDAO 2020 qualifiers Max Halford Raphaël Sourty Robin Vaysse Webinar on Data Analysis for Satellite Tracking Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 1 / 14 Sunday 12 th April, 2020

Our team Max Halford, 3rd year PhD student at IMT/IRIT Raphaël Sourty, 1st year PhD student at IRIT Robin Vaysse, 1st year PhD student at IRIT We like competitive data science! Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 2 / 14

Context Satellite position forecasting Two tracks with separate leaderboards: 1. Make the most accurate predictions possible 2. Make accurate predictions with two constraints: 2.1 Take less than 60 seconds 2.2 Keep peak RAM usage under 500MB Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 3 / 14

The data Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 4 / 14

Our solution in a nutshell We validate locally on the last 40% of the data Our approach is simple enough to be used for both tracks without modifications Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 5 / 14 We train one model per satellite and per coordinate ( 300 × 6 = 1800 models) Each model is an autoregressive (AR) process of order p = 48 In other words, we train a linear regression to predict y n +1 from { y n − 48 , . . . , y n } , that’s all! To predict several steps ahead, we use the prediction at step n + 1 as a feature at step n + 2

Starting simple https://github.com/onnx/sklearn-onnx Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 6 / 14

Auto-regression Using past target values makes sense because the data is very periodic For every satellite and coordinate, we build a vector of features Each vector contains the p past target values We obtain n feature vectors and n targets For forecasting into the future, we: 1. Make a prediction for the next time step 2. Append the prediction to the feature vector 3. Remove the oldest value from the vector 4. Repeat from step 1. Flexible framework: Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 7 / 14 • Any regression model can be plugged in • Any feature can be added, provided it can be computed online

Dealing with speed AR models are slow at inference because of their sequential nature In scikit-learn , calling .predict(X) many times incurs a large overhead We “stripped” the scikit-learn classes we used to their bare minimum by overriding some of their methods Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 8 / 14

Overriding scikit-learn ’s linear regression class StandardScaler (preprocessing.StandardScaler): """Barebones implementation with less overhead than sklearn.""" def transform(self, X): return (X - self.mean_) / self.var_ ** .5 class LinearRegression (linear_model.LinearRegression): """Barebones implementation with less overhead than sklearn.""" def predict(self, X): return np.dot(X, self.coef_) + self.intercept_ Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 9 / 14 More information here. We’ve also learned about sklearn-onnx.

Dealing with memory usage We used a Python package called memory_profiler to measure the memory usage of our script. Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 10 / 14

What didn’t work Gaussian processes with sinusoidal kernels gave good training results, but fared poorly on the test set auto-regressive mode We got no improvement by training a multi-output linear regression to try capturing coordinate dependencies 1 Boris N. Oreshkin et al. “N-BEATS: Neural basis expansion analysis for interpretable time series http://arxiv.org/abs/1905.10437 . Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 11 / 14 The N-BEATS 1 model fits perfectly to the training data but diverges in forecasting”. In: CoRR abs/1905.10437 (2019). arXiv: 1905.10437 . url:

Production considerations Our model is essentially a linear regression Linear regression can be trained with stochastic gradient descent (SGD) SGD requires one sample at a time, and is thus enables online algorithm Online learning allows learning from a stream of data Predicting satellite positions is inherently a streaming problem, therefore models that can be trained online should be preferred Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 12 / 14 Shameless publicity: check out creme and chantilly for online learning

Our advice for competitive data science “ Keep it simple, stupid ” (KISS principle) Always start by setting up a local validation benchmark When your model improves, save your work ( git is your friend) Doubt everything you do Don’t be scared to try stufg, but don’t tunnel vision Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 13 / 14

Code can be found on GitHub Thank you for listening! Our solution to the IDAO 2020 qualifiers Max Halford, Raphaël Sourty, Robin Vaysse 14 / 14

Our solution to the IDAO 2020 qualifiers Max Halford Raphal Sourty - PowerPoint PPT Presentation

Our solution to the IDAO 2020 qualifiers Max Halford Raphal Sourty Robin Vaysse Webinar on Data Analysis for Satellite Tracking Our solution to the IDAO 2020 qualifiers Max Halford, Raphal Sourty, Robin Vaysse 1 / 14 Sunday 12 th April,

Type Qualifiers and Security This presentation will discuss two papers that use qualifiers for

2/17/2017 Continued from yesterday >java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Break Free with TFG DB2 Agenda : DB2 vs. Oracle the history Three critical qualifiers

XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

E&E MANAGEMENT PROFESSIONAL International Product and Solution Center Solution Background

Reliable solution for your needs LIGHT INDUSTRY SOLUTION COASTAL SOLUTION (Non IMO) 4 main

Tamper amperLoks Loks Da DataV taVault ault Dr Drug ug Testing Solution esting Solution

The V The V The V The V- - - -30 Drilling Solution 30 Drilling Solution 30 Drilling

INNOVATIVE BALLAST WATER MANAGEMENT SHIP SOLUTION PORT SOLUTION OFFSHORE SOLUTION INTRODUCTION

Panasonic Hybrid IP-PBX Solution Toward your Future NETCOM Panasonic Hybrid IP- -PBX

CS137: Dynamic Programming Electronic Design Automation Solution Solution described is

SDN Solution Overview Ericsson SDN Solution Agenda Market Opportunity Solution Overview

Company Name 1 Team 2 Problem What problem are you solving? 3 Solution What is your

AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 National

Choosing Your Advisor Andrew Wood and Nadezhda Voronova CS 697: Graduate Initiation 2/05/2020

Tarzan: A Peer-to-Peer Anonymizing Network Layer Michael J. Freedman, NYU Robert Morris, MIT

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Learning objectives Participants will be able to 1. Identify at least 5 solutions that

Entrepreneurship driving water impact for all Sessi ssion 1 1 Reach ching the l last m mile

MANTICORE: Providing Users with a Logical IP Network Service Victor Reijs (HEAnet) MANTICORE

Project Things A secure gateway to connect your things to Internet February 2, 2019

Visually Significant Edges Tun O. Aydn, Martin adk, Karol Myszkowski MPI Informatik Edge

Introduction Traditional visual field is done at doctors office High cost Labor

Our solution to the IDAO 2020 qualifiers Max Halford Raphal Sourty - PowerPoint PPT Presentation

Our solution to the IDAO 2020 qualifiers Max Halford Raphal Sourty Robin Vaysse Webinar on Data Analysis for Satellite Tracking Our solution to the IDAO 2020 qualifiers Max Halford, Raphal Sourty, Robin Vaysse 1 / 14 Sunday 12 th April,

Type Qualifiers and Security This presentation will discuss two papers that use qualifiers for

2/17/2017 Continued from yesterday &gt;java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Break Free with TFG DB2 Agenda : DB2 vs. Oracle the history Three critical qualifiers

XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

E&amp;E MANAGEMENT PROFESSIONAL International Product and Solution Center Solution Background

Reliable solution for your needs LIGHT INDUSTRY SOLUTION COASTAL SOLUTION (Non IMO) 4 main

Tamper amperLoks Loks Da DataV taVault ault Dr Drug ug Testing Solution esting Solution

The V The V The V The V- - - -30 Drilling Solution 30 Drilling Solution 30 Drilling

INNOVATIVE BALLAST WATER MANAGEMENT SHIP SOLUTION PORT SOLUTION OFFSHORE SOLUTION INTRODUCTION

Panasonic Hybrid IP-PBX Solution Toward your Future NETCOM Panasonic Hybrid IP- -PBX

CS137: Dynamic Programming Electronic Design Automation Solution Solution described is

SDN Solution Overview Ericsson SDN Solution Agenda Market Opportunity Solution Overview

Company Name 1 Team 2 Problem What problem are you solving? 3 Solution What is your

AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 AHMF 2020 National

Choosing Your Advisor Andrew Wood and Nadezhda Voronova CS 697: Graduate Initiation 2/05/2020

Tarzan: A Peer-to-Peer Anonymizing Network Layer Michael J. Freedman, NYU Robert Morris, MIT

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Learning objectives Participants will be able to 1. Identify at least 5 solutions that

Entrepreneurship driving water impact for all Sessi ssion 1 1 Reach ching the l last m mile

MANTICORE: Providing Users with a Logical IP Network Service Victor Reijs (HEAnet) MANTICORE

Project Things A secure gateway to connect your things to Internet February 2, 2019

Visually Significant Edges Tun O. Aydn, Martin adk, Karol Myszkowski MPI Informatik Edge

Introduction Traditional visual field is done at doctors office High cost Labor

2/17/2017 Continued from yesterday >java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

E&E MANAGEMENT PROFESSIONAL International Product and Solution Center Solution Background