in memory in memory analytical platforms
play

In-memory In-memory Analytical Platforms Aditya Satyadev - PowerPoint PPT Presentation

In-memory In-memory Analytical Platforms Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio Agenda - update Introduction Context In-memory Analytics Benefits of adoption


  1. In-memory In-memory Analytical Platforms Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio

  2. Agenda - update • Introduction • Context • In-memory Analytics • Benefits of adoption • About BizAcuity and Kognitio 2

  3. Introduction • Aditya Satyadev – BizAcuity • Sachin Sangtani - Kognitio 3

  4. Analytics - Traditional Approach.. • Enterprise Data Warehouse • Monolithic Applications • MPP Architecture • BI Applications (Query Based , Caching, Daily Refresh) • Columnar Storage • ACID (Atomicity, Consistency, Isolation, Durability) • ACID (Atomicity, Consistency, Isolation, Durability) 4

  5. Analytics – Advanced Approach.. • In-Memory BI Tools • Complex BI Application Design • OLAP Cubes • Dimensional Modeling • Additional Maintenance of Cube 5

  6. Analytics – In-Memory Approach.. In-Memory Platforms • Kognitio • SAP HANA • Oracle Exalytics • ParAccel • ParAccel • EXASOL • .. 6

  7. This is a test… <begin style=“gratuitous advertising”> In the event of a real data emergency, please proceed to your In the event of a real data emergency, please proceed to your nearest in-memory database vendor and purchase large capacity of licenses <end style=“gratuitous advertising”> 7

  8. Gartner Views Media Tablets and Beyond Mobile-centric Applications and Interfaces Contextual and Social User Experience Internet Of Things Appstores and Marketplaces Next-generation Analytics Next-generation Analytics Big Data & The Logical Datawarehouse In-memory Computing Extreme Low-energy Servers Cloud Computing 8

  9. In-Memory Analytical Platforms ������������� Lower Latency �������������� ����������� Eliminate Maintenance Windows Faster Iteration Faster Iteration �������������������� �������������������� Through Requirements Higher Sophistication Around Data Timeliness ������� ��������������������������������� �������������� 9

  10. Business Value Data Quality Data Quality Optimization Data Profiling Optimization Data Profiling Current State Predictive Predictive Data Data Current State Modeling Modeling Management Management Desired State Forecasting Forecasting Reporting Reporting OLAP OLAP 10

  11. Where does all the time, effort AND MONEY go? Data Data Reporting & Reporting & Data Data Data Quality Profiling OLAP Management • Forecasting • Predictive Modeling • Optimization 11

  12. Where SHOULD all the time, effort AND MONEY go? Data Reporting & Data Data Quality Profiling OLAP Management In Memory Analytical In Memory Analytical Platform • Data Quality • Data Profiling • Data Management Data Reporting Predictive Forecasting Optimization Management & OLAP Modeling 12

  13. Typical Analysis/Reporting Query -- Balance information of targeted accounts obtained from transaction table • 11 Billion Row Fact Table -- select C.Client_ID, • Six Tables D.Demog_Group, D.Demog_Desc, 1+avg(F.Credit_Limit_Changes) CL_Issued, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) - sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Balance, • 4 Inline Nested Subqueries sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) Total_Credit, sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Total_Debit, • Multiple Passes Through Fact Table min(case when T.Trans_Type='C' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Credit, min(case when T.Trans_Type='D' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Debit from DEMO_FS.V_FIN_ACCOUNT F, DEMO_FS.V_FIN_CLIENT C, DEMO_FS.V_FIN_CLIENT_ACCOUNT_LINK L, • Aggregations/Group By DEMO_FS.V_FIN_ADD_CLIENT A, DEMO_FS.V_FIN_DEMOG_DESCS D, DEMO_FS.V_FIN_CC_TRANS T, -- • Numerous Predicates, including: -- Query to produce campaign planning -- ( – BETWEEN select Account_ID, count(Trans_Year) Years_Present, sum(No_Trans) No_Trans, sum(Total_Spend) Total_Spend, case count(Trans_Year) when 1 then 'One-off' else 'Repeat‘ end Behavior_Flag – NOT EQUAL TO – NOT EQUAL TO from ( select * from select * from ( – IN select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) No_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from DEMO_FS.V_FIN_CC_TRANS where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in ( 1. >1 Day select actionid from DEMO_FS.V_FIN_actions 2. 1 Day where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date ) 3. Hours ) Acc_Summary where No_Trans in (3,4,5,6) and Avg_Spend>1000 and Trans_Year between 2004 and 2008 4. Rewrite and pre-aggregate ) Target_Accs group by Account_ID ) Campaign_Grouping 5. Few Minutes where Campaign_Grouping.Account_ID=L.Account_ID and L.Client_ID=C.Client_ID 6. Few Seconds and C.Client_ID=A.Client_ID and A.Demog_Code=D.Demog_Code and D.Demog_code in (1,4,5,9,10,11,50,55) and Campaign_Grouping.Account_ID=F.Account_ID and Campaign_Grouping.Account_ID=T.Account_ID and T.Effective_Date < date '2009-11-15' group by C.Client_ID, Demog_Group, Demog_Desc 13 order by Days_Last_Debit;

  14. Word Problem • A user is running sales scorecards for a 3000+ person sales team globally. The report requires top down drill paths to go from global numbers down to the transactional detail for every sales person, comparing it to prior year numbers across five different measures. What would the typical IT response be? 1. Run on the weekends only please 2. Build pre-aggregated tables; will take 3 months 3. Build extracts to load into Excel 4. You can build these yourself and run them on demand 14

  15. Business Analysis & Development • A user has asked that multiple million rows of facility level data be made available intraday to monitor risk rating changes and re-compute probability of default based on a complex algorithm and alert users via push notifications within 3 minutes. 1. Yeah and monkeys might fly out of my @^%$*^*& 2. $$$$$$$$$$$$$$ ( * Rs. 60) 3. Piece of cake… 15

  16. What to look for in an In-Memory database Use of RAM Accessing RAM Effectively And Efficiently Efficient Core to RAM Ratio Memory Management Messaging and Networking Stop Following Me! True Symmetric MPP Architecture Simplicity & Maturity 16

  17. Use of RAM RAM Hard Disk Load data Performance baseline: Load speeds into memory in excess of 13TB/hour 17

  18. Accessing RAM effectively SELECT state, count(id) FROM table GROUP BY state HAVING count(id) > 50000; Compiler/Interpreter Machine Code Machine Code ID Name Description Zip State Performance baseline: 90% less code by going directly to machine code 18

  19. Core to RAM Ratio Cores RAM Performance baseline: Ideally 4-12 GB/core in memory 19

  20. Memory Management SELECT state, count(id) FROM table GROUP BY state HAVING count(id) > 50000; HAVING HAVING Temp Space On Disk GROUP BY GROUP BY SELECT SELECT 20

  21. Simplicity & Maturity Near Linear Scalability Optimizer Hints Multi-terabyte Disk I/O Contention Multi-node Maturity (20+ years vs 1) Tune your BI tool Simple to install/administer Pre-Aggregation Ecosystem Plug and Play Ecosystem Plug and Play Pre Ordering data on load Pre Ordering data on load Indexes Presupposing order of data Partitioning strategies Science Projects Caching Temp Space on Disk Projections 21

  22. Recap • Use of RAM • Accessing RAM effectively • Core to RAM Ratio • Memory Management • True Symmetric MPP • Messaging and Networking • Messaging and Networking • Simplicity 22

Recommend


More recommend