optimizing physical assets with machine learning
play

Optimizing Physical Assets with Machine Learning Rajendra Koppula - PowerPoint PPT Presentation

Optimizing Physical Assets with Machine Learning Rajendra Koppula WWW.MANIFOLD.AI About Us Manifold is a full-service AI development services firm that accelerates AI development for leading companies. Our team has a proven ability to design,


  1. Optimizing Physical Assets with Machine Learning Rajendra Koppula WWW.MANIFOLD.AI

  2. About Us Manifold is a full-service AI development services firm that accelerates AI development for leading companies. Our team has a proven ability to design, build, deploy, and manage data applications at scale. WWW.MANIFOLD.AI

  3. Audience & Agenda Audience Agenda • Practitioners with some • Introduction & Motivation knowledge of PyData eco- • Design Patterns system, ML workflows • Conclusion & Key Takeaways Slides www.manifold.ai/2019SensorsExpo WWW.MANIFOLD.AI

  4. Lean AI 1. Build the simplest E2E system first. 2. Make iterations as quickly as possible. WWW.MANIFOLD.AI

  5. Case Study • Leading industrial services company • “We want to use AI to be more efficient across our operations. The vision is to create a system for making better decisions.” WWW.MANIFOLD.AI

  6. • I get paid for uptime, how can I make Business that higher? Understanding • Unplanned maintenance costs me a lot of time and money and erodes customer Workshop satisfaction, how can I prevent that? • I roll trucks every 30 days for What are your business preventative maintenance, no matter problems (that you what. Can I go less often? think AI can help you • I have sensors on all these units and I’ve with)? been collecting data for a few years. I want to get more value out of this instrumentation. • Many, many more... WWW.MANIFOLD.AI

  7. AI Uncertainty Principle AI AI v value ≤ bu business value x da data ta quality ty x pr predictive sign gnal Multiplicative! If any term goes to 0, value goes to 0! WWW.MANIFOLD.AI

  8. Create an AI Specification • Predict major faults where machine is continuously down for >2 hours. • Predict whether major fault will happen over a horizon of 1, 2, … , 5 days. • Use machine-generated data as input features, e.g., ~30 continuous time series, ~20 discrete time series. • Use demographic data about machines, e.g., unit type, location, etc. • Do not use human-generated service data because of data quality issues. WWW.MANIFOLD.AI

  9. Typical ML Workflow Fe Feature Mod Model Pre Prepro processin ing S3 Mod Modeling En Engineering Dep Deploy oymen ent Database WWW.MANIFOLD.AI

  10. Why This Target? • Clear business value because company gets paid for uptime and often there is customer call and truck is rolled if there machine is in major fault. • Acceptable data quality because it is purely machine generated, i.e., can look at the status register. • Defined major as >2 hours continuously in faulted state. Most lesser faults are automatically or manually cleared before this time. Lookback Horizon = 2 days = 5 days WWW.MANIFOLD.AI

  11. AI Uncertainty Principle AI AI v value ≤ bu business value x da data ta quality ty x pr predictive sign gnal De-risked as much as possible. Have to take leap of faith now. WWW.MANIFOLD.AI

  12. Data Engineering is the Foundation Foundation source: Monica Rogati WWW.MANIFOLD.AI

  13. Spec the Requirements The Constants The Variables • AI/ML is software engineering. • Volume of data • You will develop locally. • Velocity of data • You will develop in the cloud. • Source of data • You will collaborate. • Important features • You will experiment. • Downstream integrations • You will deploy. • Prediction velocity • Training velocity WWW.MANIFOLD.AI

  14. Architecting the Solution The Constants The Variables • Docker-first ML with Orbyter • Sampling • How to generate training and test data? • TS data subtleties • Architecting for Volume • Spark + DASK + HDF5 • Modeling • Trees and Interpretability • Feature Engineering • github.com/manifoldai/orbyter- • Evaluation docker • Deployment WWW.MANIFOLD.AI

  15. What is the ML Problem? • 50+ sensors logged @ 1 minute intervals 24/7 • Pose as supervised learning problem WWW.MANIFOLD.AI

  16. Sampling for Supervised Learning • Train a supervised learning algorithm using historical examples. It learns patterns where there are failures and looks for them in the future. • This requires us to pass historical samples in a clean manner by slicing and dicing the time series the way we need. ETL X y WWW.MANIFOLD.AI

  17. Preventing Data Leakage X 700k,54,2880 • Separate data into training set and validation set, 70% training, 30% validation. No data leakage. • Prevent overfitting. 700k samples 200k samples WWW.MANIFOLD.AI

  18. Sample Rebalancing and Filtering • Failure is a rare event. • Many y=0 samples than y=1 samples. May have to rebalance training dataset. • Invalid sample rejection, for ex - don’t let fault predict fault. The unit is already significantly faulted at this point. Predicting is not really useful at this point. Lookback Horizon = 5 days = 2 days WWW.MANIFOLD.AI

  19. Feature Engineering Workshop Desired output = prioritized list of features • Need the domain experts in the room, i.e. mechanical engineers, head of maintenance, SW engineering • Feature engineering is the main way you are encoding their domain knowledge • Must trade off predictive power with engineering complexity WWW.MANIFOLD.AI

  20. Feature Engineering • Continuous Time Series Features • Mean over lookback X 700k,54,2880 • Variance over lookback • Fourier Transform • Trend over lookback • Discrete Time Series Features • State counts over lookback Collapse the time • Demographic Features dimension • One hot encoded F 700k,54,N Feature Matrix WWW.MANIFOLD.AI

  21. Architect for Volume • Ingest is optimized for throughput and high availability • Data from an asset is spread across many files in S3 • Varying sizes • Different time periods • Sampler Pipeline works well if all the data from an asset is in one contiguous file => Use Spark to gather, massage and transform WWW.MANIFOLD.AI

  22. Tools in the Pipeline • A (very) high-level picture of the pipeline • Spark for ETL • Dask for Feature Engineering • HDF5 as storage engine WWW.MANIFOLD.AI

  23. Dask: Out of Core • Create a dask array from a HDF5 dataset • 250 GB of data on disk • Pass the 3d dask array to feature engineering step WWW.MANIFOLD.AI

  24. Dask: Parallelism • Build the series of features • All compute is delayed until .compute() is called WWW.MANIFOLD.AI

  25. Dask: Parallelism • Another example of feature engineering • Build a lot of histograms WWW.MANIFOLD.AI

  26. WWW.MANIFOLD.AI

  27. The Fun Stuff The fun stuff source: Monica Rogati WWW.MANIFOLD.AI

  28. Create a Baseline Model • classification > regression • class errors are easier to understand learn from • even for continuous targets, you may want to do a binary (or multiclass) classifier before regression • random forest > gradient boosted trees > deep learning • few parameters to tune, robust to overfitting, quick to train • interpretable feature importance to learn from • pick a few features to start, then create more features It’s all about learning! Then iterate, iterate, iterate. WWW.MANIFOLD.AI

  29. Evaluate to Learn • Aggregate Metrics • Cross-Validated ROC and AUC = your score to improve by iterative modelling • Feature importance done properly • Individual Metrics (Sample-level) • Prediction probability distribution • “Four corners and the middle analysis” • most accurate negatives • most accurate positives • least accurate negatives • least accurate positives • least certain estimates WWW.MANIFOLD.AI

  30. Iterate the Baseline Model Deep Learning X 700k,5 (CNNs) 4,2880 Model to Model Evaluation Deploy Feature Engineering F 700k,54 Tree Feature Methods ,N Matrix (RF and GBT) WWW.MANIFOLD.AI

  31. WWW.MANIFOLD.AI

  32. User Feedback Working Sessions • Multiple structured sessions with final end users. In our case they were mechanical engineers and maintenance leads. • Prototype tooling, e.g., nothing, Excel, Jupyter notebooks. • Observe their workflow and how they integrate predictions. WWW.MANIFOLD.AI

  33. Not as Simple as Looking at Predictions • Most high probability of fault units are known stressed units • Most are in basins where line pressure is high Example “Stressed” Unit WWW.MANIFOLD.AI

  34. Prediction Filtering • Rules on historical predictions to find “interesting events” • Different filters for different use cases • Absolute probability => stressed units • % prob change => “surprising” daily changes current probability of failure: .62 • Tune rules to appropriate place average probability over past 3 days: .44 • Currently tuned to have low false 42% increase in chance of failure positives • Look for a few things and find them accurately—status quo for the rest. WWW.MANIFOLD.AI

  35. Need Diagnostics to be Actionable • AI analyzes 70+ parameters to predict probability of failure • Human spends 10+ minutes looking at the data and may not be able to see what the AI sees • Triage needs to be directed to within that parameter set • Need explainable AI to point user in right direction “Can you tell me where to look?” WWW.MANIFOLD.AI

  36. Tree Interpreter • Identify what sensors are driving • This is a good starting point for the increased probability of the team to look for the failure causation • Absolute • Daily Change Today’s Contributions Daily Change in Contribution WWW.MANIFOLD.AI

Recommend


More recommend