brief bio
play

Brief Bio Rohit Jain is the CTO at Esgyn working on Apache Trafodion - PowerPoint PPT Presentation

Apache Trafodion TM (incubating) Enterprise-Class Transactional SQL-on-Hadoop DBMS trafodion.apache.org 1 Brief Bio Rohit Jain is the CTO at Esgyn working on Apache Trafodion TM , currently in incubation. Trafodion is a transactional


  1. Apache Trafodion TM (incubating) Enterprise-Class Transactional SQL-on-Hadoop DBMS trafodion.apache.org 1

  2. Brief Bio Rohit Jain is the CTO at Esgyn working on Apache Trafodion TM , currently in incubation. Trafodion is a transactional SQL-on-HBase RDBMS. Rohit worked for Tandem, Compaq, and Hewlett-Packard for the last 28 of his 39 years in application and database development. He has worked as an application developer, solutions architect, consultant, software engineer, database architect, development and QA manager, Product Manager, and Chief Technologist. His experience spans Online Transaction Processing, Operational Data Stores, Data Marts, Enterprise Data Warehouses, Business Intelligence, and Advanced Analytics, on distributed massively parallel systems. Rohit Jain CTO, Esgyn rohit.jain@esgyn.com 2 2

  3. Apache Trafodion TM Open source project to develop operational SQL-on-Hadoop database engine + Apache HBase TM Transactional SQL Rides the unstoppable Apache Hadoop TM wave! Full-function ANSI SQL with JDBC/ODBC access Transforms how companies store, process, and share big data Leverages existing SQL skills, tools, & apps for productivity Affordable performance, elastic scalability, availability Distributed ACID transaction protection Open source project - downloadable for free Data consistency across multiple rows, tables, SQL statements Apache Trafodion TM is currently undergoing Incubation at the Targeted for operational workloads! Apache Software Foundation Optimized for real-time transaction processing applications, Eliminates vendor lock-in and licensing fees operational reporting, and Operational Data Stores (ODS), needing sub-second response times at high levels of concurrency Leverages community development resources and speed Schema flexibility and multi-structured data Data federation: Trafodion/HBase/Hive tables Enables multiple data model deployment with schema flexibility Capturing and storing all data for all business functions 3

  4. Types of workloads Essential to operate the business To improve performance of the company BI Analytics OLTP ODS • Non-transactional • Non-transactional • Can be transactional • Mostly transactional • Seconds to minutes • Minutes to hours • Sub-second response • Sub-second to seconds • Business internal • Business internal • Customer experience • Customer experience or • Batch to streaming feeds • Batch/aggregates from BI Business internal • Large update volume from OLTP/ODS • Batch to streaming feeds • No direct updates • High concurrency • No direct updates from OLTP • Low concurrency • Scales linearly • Low update volume • Low to high concurrency • Complex queries, non- • Normalized data model • Low concurrency if • Less linear in scale linear scale • Custom applications or internal, high otherwise • Historical data • Historical & big data 3 rd party solutions • Near linear scale • Columnar store • Dimension data model • Mostly SMP; MPP for • Historical data • BI tools – reporting & • Analytics in database web-scale • Normalized data model dashboards • Keyed updates/queries • Analytical tools • Custom apps / 3 rd party • Ad hoc & scheduled • Ad hoc queries queries and large extracts • Keyed queries 4

  5. Operational Workloads come to Apache Hadoop TM Operational Business Intelligence Analytics Enterprise Supply Chain Resource Management Planning Shared Cache Customer Financial Relationship Resource Management Management Manufacturing Human Resource Resource Planning Management Data movement Data movement Shared Disk SAN Column store for fast analytics & duplication & duplication • Transform • Complement • Modernize • Offload Hadoop Cluster • Offload Switch Switch Operational Business Intelligence Analytics ORC Files 5

  6. Banking NonStop Mission Critical OLTP system IBM Mainframe Daily Monthly transactional transactional Data Data Change Data Commercial Capture & Consumer Banking Transactions Change Data Capture Streaming Hadoop Cluster real-time updates • Transform  Enrich data • Modernize  Enhance UX Switch Switch • Offload o Online access o Statements o Transactional Operational Data Store Multiple years of transactions & statements 6

  7. Telco Billing & Fulfillment • Transform Revenue Mgt • Modernize • Offload Trafodion for transactions to operational reporting For closed loop analytics Images Social Media Video Email Texts Documents Audio Semi-structured data Unstructured data Mediation IN HLR HRBT ICS PTT MDSP MMSC SMSC Intelligent Network (IN), Home Location Register (HLR), Mobile Switching Center (MSC), SMS Center (SMSC), and network elements for other value added services like Push-to-talk (PTT), Ring Back Tone (RBT) 7

  8. Online Retail … Price … • Structured Integration of structured, semi- Item id Description Cost structured, and unstructured support 3D … Semi- structured TV Type Display Size Resolution Brand Model • Integration of operational, historical, & … Book ISBN Author Publish Date Format Dept external (Big) data along common master data for better insights … Image … Unstructured Review … Capture data directly into open file structures SELECT all TVs WHERE Price > 2000 and Type = ‘Plasma’ and Display Size > ‘50’ Open distributed and customer sentiment is very positive HDFS structures HBase & Hive Free at last! Accessible for reporting & analytics with no latency 8

  9. Online Retail … Trafodion Asset Management Shopping Versus RDBMS & NoSQL • High concurrency low • • Print Calendars, Cards, … Create album latency workloads • Limitless elastic scale • • Upload / Import pictures into album Order prints, mugs, linen, • Very low TCO jewelry, cases, covers, • Create a project / photo book cards, teddy bears, … • OLTP on Share album / project with family / friends Hadoop 9

  10. Online Retail … OLTP Share pictures INSERT into Trafodion table REL Various technologies can be Tag pictures (cust_id, rel_with_cust_id, rel- type, …) BEGIN WORK used to analyze the pictures to BEGIN WORK INSERT custom tags for each tagged picture into automatically create tags stored INSERT list of pictures shared into HBase table PIC_ATTR as col-value pairs Trafodion table SHARED_PIC in HBase PIC_ATTR END WORK (pic_id, rel_with_cust_id) END WORK OLTP Trafodion OLTP Upload pictures Order photo mug & jewelry Pictures loaded into HDFS by app BEGIN WORK BEGIN WORK INSERT list of pictures uploaded into INSERT into ORDER (cust_id, order_no, order_date, order_total , …) Trafodion table PIC (cust_id, album_id, pic_id, pic_date , …) INSERT into ORDER_DETAIL Transaction INSERT picture attributes from camera into all items that are part of the order (cust_id, order_no, item_id, pic_id, qty, amt , …) HBase table PIC_ATTR as col-value pairs for each of the pictures using pic_id END WORK END WORK ODS Search for pictures Create album Versus RDBMS & NoSQL SELECT pictures taken with my “Sony DSC - INSERT into Trafodion table ALBUM • Rich ANSI SQL RDBMS features RX100M2” camera in the last 6 months from my (cust_id, album_id, album_name , …) “Travel” album with a tag “Emma” on it. • Full ACID transactional support • Integration of structured, semi- structured, & unstructured data Backend operational workloads ODS Order tracking, supply chain, inventory control, … 10

  11. Online Retail OLTP Analytics Analytics • Items bought together – Analytics in Spark to generate Web market basket analysis recommendation model app • Promotion success customer classification • … Trafodion Spark Using model & customer score / BI reporting • Sales growth by attributes, and product, region, demo recent purchase • Growth in customers, history make pictures, storage, … recommendations • Growth in sharing Rohit, consider a 50% • … blanket for your granddaughter at 50% off with her Reporting & image imprinted on it Analytics via Spark BI Versus RDBMS & NoSQL • Data captured in an open file system with open APIs • Is available with no latency for reporting & analysis • Via a huge open source & proprietary Hadoop eco-system 11

  12. Why Apache Trafodion TM ? Ingredients for a world class RDBMS 1. Time, Money, and Talent • 20+ years of investment • $300+ million invested • Database developers grew up on – Shared nothing Massively Parallel Architecture – With a single system image across clusters • 300+ years of database experience – On building OLTP and BI engines ANSI and non-ANSI functionality supported, performance, scalability, concurrency, throughput, stability, high availability, transactional, and myriad of other capabilities across a multitude of workloads Amazing we were able to convince HP to open source this IP to give Trafodion an unfair advantage! 12

Recommend


More recommend