you call it data lake we call it data historian
play

You call it Data Lake; we call it Data Historian Naghman Waheed - PowerPoint PPT Presentation

You call it Data Lake; we call it Data Historian Naghman Waheed Data Platforms Lead Brian Arnold Data Platforms Architect May-24-2018 Naghman Waheed Brian Arnold Data Platforms Lead Data Platforms Architect 10 year career in IT, 6


  1. You call it Data Lake; we call it Data Historian Naghman Waheed – Data Platforms Lead Brian Arnold – Data Platforms Architect May-24-2018

  2. Naghman Waheed Brian Arnold Data Platforms Lead Data Platforms Architect 10 year career in IT, 6 years in Big 25+ year career at Monsanto. • • Data Data Warehousing, Business • Software Development, Functional • Intelligence, Data Architecture, Programming, Streaming, Big Cloud Engineering. Data, Cloud Engineering Data solutions spanning key Ecommerce, Recommendation • • Engines business functions such as Supply Chain, Manufacturing, Order-To- Cash, Finance and Procurement.

  3. Monsanto - Who are we? Bringing a broad range of solutions to help nourish our growing • world Produce with more judicious use of limited natural Headquartered in Saint Louis, Missouri • resources. >20,000 employees in 66 countries • A global company with >50% employees based outside of the • “We succeed when Increase United States production farmers succeed.” to meet - Hugh Grant, Monsanto One of the 25 World’s Best Multinational Workplaces by Great needs of • a growing CEO Place to Work Institute population. improve the lives of the world’s farmers.

  4. Solving real challenges in agriculture industry Rising Changing Changing Limited Population Economies Climate Farmland and Diets Growing enough for Farmers are impacted Farmers will need to a growing world produce enough food by climate change A growing global middle with fewer resources in many ways: class is choosing animal to support our protein – meat, eggs, world population and dairy – as a larger part of their diet WATER AVAILABILITY ISSUES 9.6B + INCREASINGLY 7.1B 14% UNPREDICTABLE WEATHER 1 <1/3 4.4B INSECT RANGE EXPANSION 9% WEED PRESSURE CHANGES CROP DISEASE INCREASES 2050 1965 1961 2030 1980 TODAY 2050 Dietary Percentage of PLANTING ZONE SHIFTS Acres per Person Global Population Protein

  5. Our Solutions for Sustainable Agriculture Our toolkit includes: Plant Breeding Biotechnology Precision Crop Protection Agriculture 5

  6. Key Technology Trends In Agriculture Economies of Data Low-cost Observation Mobile Device Science at Scale Technology /IoT Proliferation among Growers 1 <1/3 Connected sensors on tractors, A typical farm is generating 20GB of combines, and in fields has increased 94% of US farmers own a mobile unique field data every year over 1000x in the last 10 years phone or a smartphone Computing unit costs have gone down The cost of the average digital sensor 2050 1961 Compared to less than 10& 10 years by 1,000x in last 10 years had dropped more than half over that ago time Source : Gartner Technology Trends 2015

  7. Why Data Historian? • Cloud First • Open Source Strategy • API First • Ecosystem fit • Ingestion • Access Capabilities • Integration • Self Service • Scalable • Fault Architecture Tolerance • Performance • License • Infrastructure TCO • Cost review • Support • Customization • Iterative Build vs. Buy release • Technology commitment

  8. Data Strategy Enterprise Data Company Warehouse 360 Research Customer Datastore 360 Geospatial Change Data Platform Product Capture Data 360 Extract FrontDoor Transform Event Ancestry Load 360 Datastore Location 360 Other Quality Insights Datastores Management Enterprise Analytics Platform Change Data Data Hub others Capture Kafka Haystack Visualization Data Historian Ingest Process Persist Integrate Analyze Expose Discover

  9. Data Platforms Ecosystem Analytics Platform Virtual Directory Identity Service Management Authorization Trusted Partner Portal Geospatial Authentication Platform To Company API Data Historian Data 360 Gateway FrontDoor Custom Customer Tag & Register API Harvester Metadata linked to search 360 APIs Product To 360 API Gateway Enterprise Topic Data Hub Event Metadata 360 Kafka Location 360 Ontology Haystack Insights Management Extract Change Data Quality Visualization Virtualization Transform Capture Management others Load Transactional Systems Archive Log Enterprise Research Ancestry Other Data Datastore Datastore Datastores Warehouse Change Data 30 minute To IDM Capture latency Streaming UI Ingestion API Ingestion Ingestion Batch Ingestion Data Stores Data Historian

  10. Data Historian - Reference Architecture Authentication / Authentication / Identity Authorization Authorization Management Access Data Ingest Data Storage & Processing Governance Historian Rules UI Data Stores Historian Adhoc Analysis Audit Monsanto UI Internal Rules Data Historian Users Processing Monsanto Engine Security File Internal API Upload Users Query Gateway Engine API Access Metadata AWS S3 Applications Data Stores Store Storage Metadata Kafka Management Archive Glacier Storage Streaming

  11. Data Historian – Technology Lambda Glacier S3

  12. AWS Data Historian Architecture

  13. Ingest • Batch imports from RDBMS Full, delta, merge • • Streaming from Kafka (Datahub) • File ingestion through API and Data Historian UI Users can append files to existing datasets as well •

  14. Ingestion Process Rebuild Materialized View Export Data To Import Raw Build Hive Staging Scheduler Master Tables in Validation Export Records Tables in HDFS S3 Archive Export Raw Data To S3

  15. Metadata • Required fields Name, Description, Source, Publisher, etc. • • Optional fields Tags, custom fields • • Forwarded to our metadata platform (Haystack) Metadata objects pushed through Kafka (Datahub) •

  16. Exports Export to RDBMS • Materialized Full, delta, merge • Export Archive Export to Kafka • Export Calculate Purge RDBMS Purge Scheduler Query Validation Target Source Export Predicate Export to Redshift • Kafka Export Export to S3 • S3 Export Materialized Export •

  17. Governance • Archive & Retention • Automated Compliance Checks • Security • Permissions

  18. APIs & Integration Get/List/Put Datasets • Get/Put Dataset Metadata • Get Dataset Status • Query • SDKs • Java, Scala, R, Python •

  19. APIs - Query Physical Client Virtual Data Historian Data Historian UI API Data Historian JDBC Driver Data Historian Security Service

  20. Data Historian UI - Query Interface 20 4

  21. Data Historian UI – Browse Datasets 21 4

  22. Data Historian UI – Dataset Details 22 4

  23. Data Historian UI – Permissions Management 23 4

  24. Data Historian UI - Future 24 4

  25. Highlights v1.0 production release 16 months ago • 164 active datasets in prod • 10TB of data in prod • >1,000+ query requests per day • Early Adopters : • Internal Security Office, Research & Development • Early Majority : • IoT, Data Assets, Supply Chain • Late Majority : • Finance, Commercial, HR, Other • 25

  26. Lessons Learned • Open Source Flexibility • Learning Curve • Specialized Skill Set • • Cloud – AWS Agility • Security • • Support • Resource Staffing 26

  27. Questions? 27

Recommend


More recommend