Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata
We work in the Upstream Oil and Gas Industry Upstream Downstream – like chemical Transport Retail – like manufacturing and any other Logistics Supply & Exploration Development Production Trading Retail Refining Distribution Trading – similar to any other commodities trading #sunandjanetalkdata
Upstream O&G has complex and varied data Subsurface Scientific Data IT data Facilities OT data Business/Mgmt Reporting #sunandjanetalkdata
Subsurface Data § Measurement data from sensors, often many TBs § Mainly received in batch as the result of a data acquisition event carried out by a different company (oil field service company) #sunandjanetalkdata
…buried in a long history of data exchange formats DVD Tape Disk #sunandjanetalkdata
This is how this data is traditionally stored § Library style storage - Physical items (rocks, fluids) - Tapes and hardcopy - Digital files § To use data, it is moved into technical/scientific applications - File > Import… - Manual - Decisions made during import #sunandjanetalkdata
Digital Transformation? Not yet. Interpreting seismic #sunandjanetalkdata
And it’s the same for interpreting well logs #sunandjanetalkdata
True digital transformation requires a data platform that: Allows you to look wide – across all of Allows you to look Allows you to Keeps all the data the data, from deep - into the combine data safe for the future different oil fields, detail and history of across traditional from different the data boundaries countries #sunandjanetalkdata
We need to reduce the manual steps § Every piece of data treated as unique § Data is stored as-is, and then manually coerced into applications § Data is valid for decades - so we are often loading old formats § Data sets might not have complete metadata § If metadata is missing – can we infer it? - Humans can, if they are experts in the field #sunandjanetalkdata
Improving data management with autonomous pipelines Explore Scale Identify Building Autonomy Automate Contextualize Pipleine Standardize 12
Building Autonomous Capabilities Full Autonomous • Fully created by AI on demand High Automation • Fully scalable • AI can do new Lifting the Baseline pipelines with Conditional human supervision Automation • High Scalability • Humans defines new Partial scalable piplines and Automation AI can be trained • Scalable • Humans build Assisted Ingest pipelines • Humans Standardize, Test Fully Manual and Improve ingest • Humans Explore, Identify and Describes new pipelines Increased Automation #sunandjanetalkdata
Implement a layered data architecture Ingest Prepare Consume Governance & Security Sources Consumers Metadata Technical Business Operational Reference Information Architecture Applications Business General Generated Matching & Consumers Preparing Subsurface Interpretation Landing Transforming Add measurements in standard units Human Well Planning Generated Data Data as received, with metadata Connecting data as requested Analysts Assign common keys Optimised Structures Extract from closed formats Values API Layer Production Forecasting Interaction Create Derived Generated Simulation Autonomous Application Machine Generated New Apps? Data Scientists #sunandjanetalkdata
Autonomous Pipelines with Layered Architecture LAND EXTRACT MATCH TRANSFORM PREPARE • Safely store what you • Get the data out of the • Assign correct keys • Transform the data to a • Create datasets that received, in the format weird file format from MDM standard model will serve specific you received it – and usage needs • Calculate and add with any metadata that standardised came with it measurements Building Building Building Building Autonomy Autonomy Autonomy Autonomy Add New Products to the Pipeline #sunandjanetalkdata
LAND: Store what you receive, together with the metadata about your measurement data Is this the What’s the raw accuracy of measure, or the measure? a derived value? Why was it measured? 42 What was measured? When was it measured? What unit? Who was it measured by? Where was it measured? #sunandjanetalkdata
EXTRACT: Make it readable and re-usable. Don’t worry if data gets bigger #sunandjanetalkdata
MATCH: Common Keys § Master Data Management § Reference Data Management § Ontology and Business Glossary - Old workflows stayed within disciplines, and now drilling engineers use different words from exploration geoscience from production operations #sunandjanetalkdata
MATCH: Units of Measure § Define your standard Units of Measure (normally best to use SI) § Create a service to do UoM conversions - We suggest to use the Energistics UoM v1 dataset as your source § When your source data is in a different unit system, convert it and add the standard UoM values to your data § Keep the original measured data and unit – just in case J #sunandjanetalkdata
MATCH: Geospatial Data § Geospatial data in lat/lon – in decimal degrees, or degrees, minutes, seconds § Geospatial data in a projected coordinate system – like UTM or NAD83 – in metres or in yards § To be able to combine data from different regions, create a converted version (normally WGS84 lat/lon in decimal degrees) and store with your data § Create a service to do transformations – suggest http://www.epsg-registry.org as your source for transformations #sunandjanetalkdata
TRANSFORM : SOL vs SOR – it’s AND not OR Data Combined and ready for the Analytics you want to perform, needs precision not guesses. This will require transformation models Some transformations are not known, only guesses. How you transform the data to join it depends on what you are using it for #sunandjanetalkdata
Derived Data in Transform § With transactional data, derived data is normally SUM, MIN, MEAN, MAX § With scientific data, it can be positions and geometric projections § If there is an accepted company standard way to convert from eg directional survey to well path, then this can be done in TRANSFORM #sunandjanetalkdata
Things which are guesses – Joining Well and Seismic Data § Measured in depth, a § Measured in two-way travel time (ms) distance (metres or § Each data point is the size of an office feet/inches) building § Cm scale or smaller § Data for a large volume of the § Data only valid on the well subsurface path When there is a choice – you Matching this data requires decisions shouldn’t do it in TRANSFORM • A time-depth mapping – it belongs in PREPARE • Decisions on how far – and how - to propagate well data through the volume #sunandjanetalkdata
PREPARE: Datasets for a specific purpose OLD WORLD APPLICATIONS NEW WORLD ANALYTICS § Creation of files in “transfer format” § Creating big wide analytical datasets § Feeding existing app APIs § Datasets supporting new § Re-creatable applications via APIs Understand : • Usage scenarios • Data freshness • Performance requirements • Accuracy, precision • Granularity #sunandjanetalkdata
PREPARE: The biggest blocker to data science is creating the analytical datasets If your boss asks you, tell them that I said “build a Unified Data Warehouse” – Andrew Ng Source: Nuts and bolts of applying deep learning #sunandjanetalkdata
PREPARE: You need re-creatable datasets Whether you persist your prepared layer or deliver it on LAND EXTRACT MATCH TRANSFORM PREPARE the fly: AUTOMATE!!! • Safely store what you • Get the data out of the • Assign correct keys • Transform the data to a • Create datasets that You will need to recreate these prepared datasets many, received, in the format weird file format from MDM standard model will serve specific you received it – and usage needs • Calculate and add many times based on changing assumptions for with any metadata that standardised came with it measurements transformations and joins Anything you cannot automatically recreate – if there was Building Building Building Building human intervention – then it’s a NEW dataset and needs Autonomy Autonomy Autonomy Autonomy to go back in to LAND, with new metadata Add New Products to the Pipeline #sunandjanetalkdata
In Summary § LAND data – as it was received, with all required metadata § EXTRACT from ugly, maybe binary, transfer formats to human-readable, self-describing formats If you can’t recreate (and check metadata again) a dataset without § MATCH human input – it’s a new dataset, and - Master and reference data, Units of Measure, needs to go back to Geospatial data LAND § TRANSFORM everything that is true – and no more § PREPARE datasets for specific usage, and for the old way of working as well as the new #sunandjanetalkdata
Jane McConnell Sun Maria Lehmann Practice Partner O&G , Industrial IoT Group Leading Engineer, Enterprise Data Management Jane.mcconnell@teradata.com Equinor, Norway, Trondheim +44 (0)7936 703343 My blog on Teradata.com Follow me on Twitter @sunle Follow me on Twitter @jane_mcconnell My profile My profile #sunandjanetalkdata
Rate today ’s session Session page on conference website O’Reilly Events App #sunandjanetalkdata
Recommend
More recommend