Building an analytic department From Zero to TensorFlow 1
The Peter principle: People in a hierarchy tend to rise to their “ level of incompetence ”: An employee is promoted based on their success in previous jobs, until they reach a level at which they are no longer competent, as skills in one job do not necessarily translate to another.
Introductions Antoine Desmet Analytics manager – Smart Solutions, Komatsu 3
Hunter Valley
2000 The US Defense Department ended the purposeful degradation of GPS 2008 Komatsu releases Level 4 autonomy, driverless truck fleet. Operates even if wireless link is lost
Real-time terrain mapping • LIDAR on diggers • Scans stitched together into terrain map • Compare to plan • Operator sees: • Red: over-dug • Blue: matches plan • Green: needs digging In near real-time
Topics This is the story of a growing analytics team. It’s a business-oriented presentation A collection of thoughts and discoveries: sorry if I don’t have all the definitive answers • Background • The beginning: small vs. big? • Growth • R&D • Picking your projects • Stakeholder management • What’s next
Background What data do we have, what we do with it… and WHY? 9
The cost of downtime Ore extraction chain Cost=20,000$/hr Revenue = 40,000$/hr Profit when operating = +20,000 Profit on breakdown = -15,000 A leaking air hose: Time to fix = 1-2 hr Parts + labour = $300 Loss of production = 15-30 k$
Payload Machine’s motions Operator’s joysticks Motor currents Auto-lube system Air pressure 800 sensors Sampling rate: 100ms max Temperatures Brakes status
What we provide • The machine’s control system will “fault” if it detects a severe malfunction • Unplanned downtime is extremely costly in the mining industry • We analyse telemetry data to detect issues before they trigger a system fault • It’s not so much about saving the part. By the time we can detect a malfunction, often it’s already beyond repair • It’s about giving customer time to plan maintenance for what would otherwise be a disruptive unplanned breakdown
In the beginning At the peak of the “big data” hype cycle 14
Day 1 • 2014: one engineer (me) and one manager (sales) • At the peak of the “Big Data” craze, but… • In the midst of a mining downturn: no budget, pressure to deliver • 6 years prior, a visionary setup dataloggers + backend to harvest hundreds of sensor data at high rez = lots of data • Data locked-up in antiquated time-series databases You are here • Fragile infrastructure • Zero process
The Skunk works • Hired a couple of summer interns to boost output • Version control = copy/paste in separate folders That’s OK because there were only a couple of developers • Built an rudimentary “model factory” data-dredging algorithm – without any hypothesis or prior assessment. Generally viewed as poor practice… That’s OK because it’s machine data: correlations usually indicate something mechanically or electrically coupled. Feature engineering made it work. 3 Months=wide “coverage” of the machine. • Do everything on your laptop, then straight to Production That’s OK because there were no contracts or nothing mission critical. Mission-critical was demonstrating value
Reflections: Small Vs. Big Small / startup model: • Loose plan, objectives and strategy • Less capital investment from business, so lower expectations • Pick problems yourself: those that seem relevant, and “safe bets” = quick wins in months • High risk of picking the wrong projects. Fast but disorganised, bound to run into scaling issues Big / corporate model: • Large investment, financial targets set from the start • Regimented methods, pressure to deliver may hinder creativity • 1 year, 10 DS: explore, investigate use cases for analytics • Well organised, safe-but-slow approach, prepared for the long-term
Growing Product: tick – customers: tick – what’s next? 19
Another start-up that became bloated Mech/Elec engs were very productive and creative… but things started to tear at the seams: • Why document when everyone knows… bus factor! • IT upgrading databases crippled us with rework. • Lack of software engineering practices = poor: reliability, readability, re-useability, • Things started to slow down. • Routine means you become blind to your own deficiencies. • Hard to see the paradigm shift: “remember how we used to be faster, what happened?” • Accept that things are the way they are. Getting a clean run or working faster isn’t possible.
Today • 2-3 years later, we welcomed 3 team members, including a senior software dev. • The software dev went on a crusade (still going) for: unit tests, doc, libraries • The “old guard” had to lift their games and mature to integrate the “fresh blood”. Helped kick the old counter-productive habits, and work towards increasing quality and pace Our team now has: • 2 Data scientists: the theory • 2 Engineers: make it work • 2 Software developers: make it scale • 1 Analyst / report developer: make it visible • 3 Subject matter experts: make it relevant
Workflow challenges The release cliff-hanger: • Analysts are fluent at developing models on their laptop… ouch • Releasing an analytic into production is a rare event. Lack of practice = frequent fails Trialling a solution: ouch • Start with Test release of “skeleton” PROD • Instead of leaving release as final step ouch success • DevOps 101: release early and frequently!
Workflow challenges From bench to streaming: • R&D happens on a static block of time-series data (e.g. one month). • Challenge = from static to live streaming: batch size, handover between batches, catching-up (maintain full history) vs forcing forward (satisfy real-time) Standardise • Build high-level functions & templates to abstract real-time execution aspects. • Don’t lock-down the process and make it hard to build “non-standard” • Standardising helps maintainability, collaboration, etc.
3 aspects of Continuous improvement Streamline actioning the insights Streamline tools for faster analytics development Streamline analytics : generic and re-useable
R&D 25
Finance, industrial plants and insurance analytics Industrial analytics are a niche application, no-one can help me! What could there be to gain by outside of my industry? Finance f (A,B) = Ĉ C is a share price, A and B the competitor’s share prices • If Ĉ >> C: sell, Ĉ << C:buy, Ĉ = C: do noting Insurance s f (A,B) = Ĉ , C is the amount claimed, A and B some parameters of the claim • Ĉ ≈ C: do nothing, Ĉ << C: investigate a potentially fraudulent claim Plant analytics f (A,B) = Ĉ , C is the temperature of a motor, A and B are brearing temps. • Ĉ ≈ C: do nothing, Ĉ << C motor potentially overheating At the right level of abstraction, it all becomes the same. Talk to people. But I’m preaching the choir!
Interns for R&D • Autonomy: R&D can be insulated from the production systems. Low risk to business. Here’s a dataset, install [ your favourite toolset ] and go get it, tiger! • This usually produces a proof-of-concept • An intern can clear the fog on that high risk/high value project. You can make a sound decision to proceed forwards, without having used any precious permanent employee time • With the right intern: the newer the tech, the greater the challenge… the more they engage! • Co-supervision with an academic will inject a lot of their knowledge in your project. This is often a better solution vs. directly engaging into a research project with academics • You can hire the outstanding ones, risk free!
Picking projects business value vs. geeky indulgence 28
A tale of two companies merging P&H P&H Mainly sells primary digging equipment A mine owns 1-5 of them, no redundancy Very expensive, “top of the pyramid” Analytics strategy focus on fault prediction & uptime maximisation: keep them running 24/7 Komatsu Mainly sells dump trucks A mine owns 50-200 + spare units Less expensive, small loss is not-mission critical Analytics strategy focus on compliance to scheduled maintenance, part sales, operator abuse
The “no free lunch” of analytics Leaking air hose Gearbox failure Recurrent, low impact, easy: supervised Rare, extremely high impact, hard: unsupervised
TensorFlow to the rescue! Need a generic Time Series pattern recognition Weary of the deep-learning hype: “hot topic” of 2016… At the peak of Gartner’s “hype curve” Is it just for images? An overkill? A summer intern ran the project with great success (accurate and generalises) CNN + LSTM is our standard approach to detect failure patterns in automated systems. Interested in the details? Data Science Sydney Meetup - Tue 28 May
Recommend
More recommend