architecting a data platform to support analytic
play

Architecting a data platform to support analytic workflows for - PowerPoint PPT Presentation

Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata We work in the Upstream Oil and Gas Industry Upstream Downstream like chemical Transport Retail like


  1. Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata

  2. We work in the Upstream Oil and Gas Industry Upstream Downstream – like chemical Transport Retail – like manufacturing and any other Logistics Supply & Exploration Development Production Trading Retail Refining Distribution Trading – similar to any other commodities trading #sunandjanetalkdata

  3. Upstream O&G has complex and varied data Subsurface Scientific Data IT data Facilities OT data Business/Mgmt Reporting #sunandjanetalkdata

  4. Subsurface Data § Measurement data from sensors, often many TBs § Mainly received in batch as the result of a data acquisition event carried out by a different company (oil field service company) #sunandjanetalkdata

  5. …buried in a long history of data exchange formats DVD Tape Disk #sunandjanetalkdata

  6. This is how this data is traditionally stored § Library style storage - Physical items (rocks, fluids) - Tapes and hardcopy - Digital files § To use data, it is moved into technical/scientific applications - File > Import… - Manual - Decisions made during import #sunandjanetalkdata

  7. Digital Transformation? Not yet. Interpreting seismic #sunandjanetalkdata

  8. And it’s the same for interpreting well logs #sunandjanetalkdata

  9. True digital transformation requires a data platform that: Allows you to look wide – across all of Allows you to look Allows you to Keeps all the data the data, from deep - into the combine data safe for the future different oil fields, detail and history of across traditional from different the data boundaries countries #sunandjanetalkdata

  10. We need to reduce the manual steps § Every piece of data treated as unique § Data is stored as-is, and then manually coerced into applications § Data is valid for decades - so we are often loading old formats § Data sets might not have complete metadata § If metadata is missing – can we infer it? - Humans can, if they are experts in the field #sunandjanetalkdata

  11. Improving data management with autonomous pipelines Explore Scale Identify Building Autonomy Automate Contextualize Pipleine Standardize 12

  12. Building Autonomous Capabilities Full Autonomous • Fully created by AI on demand High Automation • Fully scalable • AI can do new Lifting the Baseline pipelines with Conditional human supervision Automation • High Scalability • Humans defines new Partial scalable piplines and Automation AI can be trained • Scalable • Humans build Assisted Ingest pipelines • Humans Standardize, Test Fully Manual and Improve ingest • Humans Explore, Identify and Describes new pipelines Increased Automation #sunandjanetalkdata

  13. Implement a layered data architecture Ingest Prepare Consume Governance & Security Sources Consumers Metadata Technical Business Operational Reference Information Architecture Applications Business General Generated Matching & Consumers Preparing Subsurface Interpretation Landing Transforming Add measurements in standard units Human Well Planning Generated Data Data as received, with metadata Connecting data as requested Analysts Assign common keys Optimised Structures Extract from closed formats Values API Layer Production Forecasting Interaction Create Derived Generated Simulation Autonomous Application Machine Generated New Apps? Data Scientists #sunandjanetalkdata

  14. Autonomous Pipelines with Layered Architecture LAND EXTRACT MATCH TRANSFORM PREPARE • Safely store what you • Get the data out of the • Assign correct keys • Transform the data to a • Create datasets that received, in the format weird file format from MDM standard model will serve specific you received it – and usage needs • Calculate and add with any metadata that standardised came with it measurements Building Building Building Building Autonomy Autonomy Autonomy Autonomy Add New Products to the Pipeline #sunandjanetalkdata

  15. LAND: Store what you receive, together with the metadata about your measurement data Is this the What’s the raw accuracy of measure, or the measure? a derived value? Why was it measured? 42 What was measured? When was it measured? What unit? Who was it measured by? Where was it measured? #sunandjanetalkdata

  16. EXTRACT: Make it readable and re-usable. Don’t worry if data gets bigger #sunandjanetalkdata

  17. MATCH: Common Keys § Master Data Management § Reference Data Management § Ontology and Business Glossary - Old workflows stayed within disciplines, and now drilling engineers use different words from exploration geoscience from production operations #sunandjanetalkdata

  18. MATCH: Units of Measure § Define your standard Units of Measure (normally best to use SI) § Create a service to do UoM conversions - We suggest to use the Energistics UoM v1 dataset as your source § When your source data is in a different unit system, convert it and add the standard UoM values to your data § Keep the original measured data and unit – just in case J #sunandjanetalkdata

  19. MATCH: Geospatial Data § Geospatial data in lat/lon – in decimal degrees, or degrees, minutes, seconds § Geospatial data in a projected coordinate system – like UTM or NAD83 – in metres or in yards § To be able to combine data from different regions, create a converted version (normally WGS84 lat/lon in decimal degrees) and store with your data § Create a service to do transformations – suggest http://www.epsg-registry.org as your source for transformations #sunandjanetalkdata

  20. TRANSFORM : SOL vs SOR – it’s AND not OR Data Combined and ready for the Analytics you want to perform, needs precision not guesses. This will require transformation models Some transformations are not known, only guesses. How you transform the data to join it depends on what you are using it for #sunandjanetalkdata

  21. Derived Data in Transform § With transactional data, derived data is normally SUM, MIN, MEAN, MAX § With scientific data, it can be positions and geometric projections § If there is an accepted company standard way to convert from eg directional survey to well path, then this can be done in TRANSFORM #sunandjanetalkdata

  22. Things which are guesses – Joining Well and Seismic Data § Measured in depth, a § Measured in two-way travel time (ms) distance (metres or § Each data point is the size of an office feet/inches) building § Cm scale or smaller § Data for a large volume of the § Data only valid on the well subsurface path When there is a choice – you Matching this data requires decisions shouldn’t do it in TRANSFORM • A time-depth mapping – it belongs in PREPARE • Decisions on how far – and how - to propagate well data through the volume #sunandjanetalkdata

  23. PREPARE: Datasets for a specific purpose OLD WORLD APPLICATIONS NEW WORLD ANALYTICS § Creation of files in “transfer format” § Creating big wide analytical datasets § Feeding existing app APIs § Datasets supporting new § Re-creatable applications via APIs Understand : • Usage scenarios • Data freshness • Performance requirements • Accuracy, precision • Granularity #sunandjanetalkdata

  24. PREPARE: The biggest blocker to data science is creating the analytical datasets If your boss asks you, tell them that I said “build a Unified Data Warehouse” – Andrew Ng Source: Nuts and bolts of applying deep learning #sunandjanetalkdata

  25. PREPARE: You need re-creatable datasets Whether you persist your prepared layer or deliver it on LAND EXTRACT MATCH TRANSFORM PREPARE the fly: AUTOMATE!!! • Safely store what you • Get the data out of the • Assign correct keys • Transform the data to a • Create datasets that You will need to recreate these prepared datasets many, received, in the format weird file format from MDM standard model will serve specific you received it – and usage needs • Calculate and add many times based on changing assumptions for with any metadata that standardised came with it measurements transformations and joins Anything you cannot automatically recreate – if there was Building Building Building Building human intervention – then it’s a NEW dataset and needs Autonomy Autonomy Autonomy Autonomy to go back in to LAND, with new metadata Add New Products to the Pipeline #sunandjanetalkdata

  26. In Summary § LAND data – as it was received, with all required metadata § EXTRACT from ugly, maybe binary, transfer formats to human-readable, self-describing formats If you can’t recreate (and check metadata again) a dataset without § MATCH human input – it’s a new dataset, and - Master and reference data, Units of Measure, needs to go back to Geospatial data LAND § TRANSFORM everything that is true – and no more § PREPARE datasets for specific usage, and for the old way of working as well as the new #sunandjanetalkdata

  27. Jane McConnell Sun Maria Lehmann Practice Partner O&G , Industrial IoT Group Leading Engineer, Enterprise Data Management Jane.mcconnell@teradata.com Equinor, Norway, Trondheim +44 (0)7936 703343 My blog on Teradata.com Follow me on Twitter @sunle Follow me on Twitter @jane_mcconnell My profile My profile #sunandjanetalkdata

  28. Rate today ’s session Session page on conference website O’Reilly Events App #sunandjanetalkdata

Recommend


More recommend